Guide
This tutorial and associated technical appendix have been modified from and are based on the BMD (BIOMED) statistical package documentation associated with the BMD08M factor analysis program.
Documentation and the BMD08M program were developed under a National Science Foundation grant.
Factor analysis is a data reduction technique for identifying the internal structure of a set of variables. Unlike other techniques like Regression analysis or ANOVA, factor analysis does not require that predictor and criterion variables be defined. Factor analysis attempts to identify the relationship between all variables included in the analysis set.
Factor analysis is decompositional in nature in that it identifies the underlying relationships that exist within a set of variables. Factor analysis creates groups of metric variables (interval or ratio scaled) called factors. A factor is an underlying quality found to be characteristic of the original variables. Two types of factors exist. Common factors have effects shared in common with more than one observed variable. Unique factors have effects that are unique to a specific variable.
The basic objectives of a Factor Analysis are:
- To determine how many factors are needed to explain the set of variables.
- To find the extent to which each variable is associated with each of a set of common factors.
- To provide interpretation to the common factors.
- To determine the amount of each factor possessed by each observation. (Identified by the factor scores)
In summary then, the goal is to explain a portion of their variance in the set of variables input into the analysis by identifying certain underlying common dimensions called the factors. Factor analysis helps in identifying this set of k dimensions underlying the m variables in a data set (where k < m).
79652 55462 12345 16523 46525 79665 65321 98653 46521 65435
32165 56523 65454 16589 98965 73195 15937 35079 62486 46428
This data represents the scores (0 to 9 scale) of 20 students on five finals (e.g. Math, English, History, Geography, Science). Can we say that the students’ exam grades in the different subjects are related? The relationship between the student grades is not directly measurable but are, in fact, latent. Grades in different courses could be related because of the student’s intellectual capabilities, memory capacity, or just interest. Although it should be noted that the test grades of one person may not be completely correlated with one another, we can conclude that the grades in all subject areas should depend to some degree on the general intelligence or other factors common to the learning of the subject material. Accordingly, we may identify one or more factors that explain the `common’ portion of the variance in the original raw scores.
GRAPHICAL PORTRAYAL OF MODES OF FACTOR ANALYSIS
The alternative modes of factor analysis can be portrayed graphically. The original data set is viewed as a variables-persons-occasions matrix. R-Type and Q-Type techniques deal with the variables-persons dichotomy. In contrast P-type and Q-Type analysis are used for the occasions-variables situation and S-Type and T-Type are used when the occasions-persons relationship is of interest (c).
COMMON FACTOR ANALYSIS
Factor analysis based upon a correlation matrix, with values less than 1.0 on the diagonal. The values on the diagonal, are known as communalities and are inserted in the diagonal to represent only the common variance (excludes specific and error variance), that should be solved for by the factor analysis.
COMMUNALITY
The amount of variance in the variable shared with all other variables.
PRINCIPAL COMPONENTS ANALYSIS
One variety of factor analysis. The factors are based upon an analysis of the total variance in the original data. In application, this means that the factor analysis begins with a correlation matrix which has the value of ’1′ used on the diagonal. This computationally implies that all 100% of the variance is common or shared between the variables. Other forms of factor analysis may begin with other values in the diagonal that reflect the amount of variance expected to be explained for each variable.
CORRELATION MATRIX
A table showing inter correlation among all variables analyzed.
EIGENVALUE
The sum of squares of the loadings in a column in the factor matrix. Eigenvalues are also referred to as latent roots and represent the amount of variance accounted for by a factor.
FACTOR
The smaller set of underlying composite dimensions of all variables in the data set. Factors are linear combinations of the original variables.
FACTOR LOADINGS
These are the correlation coefficients between the variables and the factors. The variables with the highest correlations provide the most meaning (in an interpretation sense) to the factor solution. The sum of the squared loadings for a given factor sum to the eigenvalue for that factor.
FACTOR MATRIX
This k variable by m factor matrix contains the factor loadings of all variables on each factor.
FACTOR ROTATION
Given a Cartesian coordinate system where the axes are the factors and the points are the variables, factor rotation is the process of holding the points constant and moving (rotating) the factor axes. The rotation is done in a manner so that the points are highly correlated with the axes and provide a more meaningful interpretation of the factor solution.
FACTOR SCORES
This is the score of each observation on the newly identified factors. This factor score is a linear combination of all of the original variables that were relevant in making the new factor.
GAMA OF ROTATION
A user input parameter that leads to different rotation schemes. Standard values of gama include 0 (for quartimax, quartimin, direct quartimin), .5 (for bi-quartimin), and 1.0 (for varimax and covarimin).
KAISER NORMALIZATION
A process by which each row of the initial factor loading matrix is normalized by dividing by the square root of hi, the row’s commonality. This normalization has the effect of making the sum of squares for each row sum to 1.0. This transformation does not affect the varimax solution.
OBLIMIN
Also called simple structure and refers to the rotated factor loadings matrix. Simple structure is difficult to define in that it refers to the situation where most of the loadings on any specific factor are small and a few loadings are as large as possible.
OBLIQUE FACTOR SOLUTIONS
A computed factor solution where the extracted factors are not independent, but are correlated. In many situations, there is no arbitrary (or theoretical) reason why the factors should be independent of each other. The analysis is conducted to express the relationship between the factors that may or may not be orthogonal; rather than arbitrarily constraining the factor solution so that the factors are independent of each other.
ORTHOGONAL
Refers to mathematical independence of the factors. Operationally, orthogonal factor axes are at right angles to each other (90o).
ORTHOGONAL FACTOR SOLUTIONS
The directional cosines of the angle between the factors in the factor solution corresponds to the correlations between the factors. Orthogonality refers to no correlation and is synonymous to a 90o angle in a Cartesian coordinate system. Orthogonal factor solutions then extract the factors so that the factor axes are maintained at right angles. Thus each factor is independent of all other factors and the correlation between the factors is zero.
SQUARED FACTOR LOADINGS
Because loadings are the correlation between the variables and the factors, the squared factor loadings could be compared to R-Square in a regression analysis. The squared factor loadings indicate the percentage of the variance of the original variable is explained by the factor. For a given factor, the sum of these squared factor loadings is the eigenvalue or latent root associated with that factor.
TRACE
It is the Sum of Squares of the numbers on the diagonal of the correlation matrix used in the factor analysis. The trace is equal to the number of variables, based on the assumption that the variance in each variable is equal to 1. With the common correlation matrix, the trace is equal to the sum of the communalities on the diagonal of the reduced correlation matrix which is also equal to the amount of common variance for the variables being analyzed.
VARIMAX ROTATIONS
An orthogonal rotation of factors that redistributes the variance accounted within the pattern of factor loadings. Both the communalities and the total variance accounted for are the same before and after rotation. This procedure is the most commonly used to re-orient or clean up the loadings obtained in a principal components analysis.
- Mean and Standard Deviation for the variables
- Variance-Covariance Matrix
- Correlation Matrix
- N Matrix
- Eigenvalues
- Cumulative proportion of total variance
- Proportion of Variance per Eigenvalue
- Factor Matrix before rotation
- Rotated Factor Matrix
- Factor Score Coefficients