Factor Analysis

From QualtricsWiki

Jump to: navigation, search

Contents

[edit] Factor Analysis

This tutorial and associated technical appendix have been modified from and are based on the BMD (BIOMED) statistical package documentation associated with the BMD08M factor analysis program.
Documentation and the BMD08M program were developed under a National Science Foundation grant.


[edit] OVERVIEW


Factor analysis is a data reduction technique for identifying the internal structure of a set of variables. Unlike other techniques like Regression analysis or ANOVA, factor analysis does not require that predictor and criterion variables be defined. Factor analysis attempts to identify the relationship between all variables included in the analysis set.

Factor analysis is decompositional in nature in that it identifies the underlying relationships that exist within a set of variables. Factor analysis creates groups of metric variables (interval or ratio scaled) called factors. A factor is an underlying quality found to be characteristic of the original variables. Two types of factors exist. Common factors have effects shared in common with more than one observed variable. Unique factors have effects that are unique to a specific variable.

[edit] OBJECTIVES OF THE FACTOR ANALYSIS

The basic objectives of a Factor Analysis are:

  • To determine how many factors are needed to explain the set of variables.
  • To find the extent to which each variable is associated with each of a set of common factors.
  • To provide interpretation to the common factors.
  • To determine the amount of each factor possessed by each observation. (Identified by the factor scores)

In summary then, the goal is to explain a portion of their variance in the set of variables input into the analysis by identifying certain underlying common dimensions called the factors. Factor analysis helps in identifying this set of k dimensions underlying the m variables in a data set (where k < m).

[edit] A Factor Analysis Example

For discussion purposes, consider the following five variable data set that is later used for the Factor program.


79652 55462 12345 16523 46525 79665 65321 98653 46521 65435 
32165 56523 65454 16589 98965 73195 15937 35079 62486 46428  

This data represents the scores (0 to 9 scale) of 20 students on five finals (e.g. Math, English, History, Geography, Science). Can we say that the students' exam grades in the different subjects are related? The relationship between the student grades is not directly measurable but are, in fact, latent. Grades in different courses could be related because of the student's intellectual capabilities, memory capacity, or just interest. Although it should be noted that the test grades of one person may not be completely correlated with one another, we can conclude that the grades in all subject areas should depend to some degree on the general intelligence or other factors common to the learning of the subject material. Accordingly, we may identify one or more factors that explain the `common' portion of the variance in the original raw scores.


[edit] Organizing Your Data for Factor Analysis

Data sets are traditionally in the form of an observations by variables matrix. Some researchers may, however, have need for analysis of data forms that do not conform to the traditional mode. For example, occasions (repeated measures) may be included or data matrices could be transposed. Each of these data forms may be analyzed using factor analysis, but will produce a decomposition of observations or occasions. Alternate forms of the factor analysis data matrix appear below. (The most common forms of factor analysis are R Type, where factors are loaded by variables and are computed across the persons and Q Type, where factors are loaded by persons and are computed across the variables).

[edit] Graphical Portrayal of Modes of Factor Analysis

The alternative modes of factor analysis can be portrayed graphically. The original data set is viewed as a variables-persons-occasions matrix. R-Type and Q-Type techniques deal with the variables-persons dichotomy. In contrast P-type and Q-Type analysis are used for the occasions-variables situation and S-Type and T-Type are used when the occasions-persons relationship is of interest (c).


                VARIABLES                 VARIABLES
              +-----------+             +-----------+
              ¦¦¦         ¦             +-----------¦
PERSONS       ¦¦¦ R-TYPE  ¦  PERSONS    ¦  Q-TYPE   ¦
              ¦¦¦         ¦             +-----------¦
              +-----------+             +-----------+
                                            
                VARIABLES                 VARIABLES  
              +-----------+             +-----------+
              ¦¦¦         ¦             +-----------¦
OCCASIONS     ¦¦¦ P-TYPE  ¦  OCCASIONS  ¦  O-TYPE   ¦
              ¦¦¦         ¦             +-----------¦
              +-----------+             +-----------+
                   
                 PERSONS                   PERSONS
              +-----------+             +-----------+ 
              ¦¦¦         ¦             +-----------¦ 
OCCASIONS     ¦¦¦ S-TYPE  ¦  OCCASIONS  ¦  T-TYPE   ¦ 
              ¦¦¦         ¦             +-----------¦ 
              +-----------+             +-----------+

[edit] DEFINITION OF TERMS COMMONLY USED IN FACTOR ANALYSIS

BIQUARTIMIN: The factor loadings matrix is transformed by an oblique (so the factors are correlated) rotation such that there is one variable with a large squared loading on the factor and the rest of the variable loadings on the factor would be close to zero.

COMMON FACTOR ANALYSIS: Factor analysis based upon a correlation matrix, with values less than 1.0 on the diagonal. The values on the diagonal, are known as communalities and are inserted in the diagonal to represent only the common variance (excludes specific and error variance), that should be solved for by the factor analysis.

COMMUNALITY: The amount of variance in the variable shared with all other variables.

PRINCIPAL COMPONENTS ANALYSIS: One variety of factor analysis. The factors are based upon an analysis of the total variance in the original data. In application, this means that the factor analysis begins with a correlation matrix which has the value of '1' used on the diagonal. This computationally implies that all 100% of the variance is common or shared between the variables. Other forms of factor analysis may begin with other values in the diagonal that reflect the amount of variance expected to be explained for each variable.

CORRELATION MATRIX: A table showing inter correlation among all variables analyzed.

EIGENVALUE: The sum of squares of the loadings in a column in the factor matrix. Eigenvalues are also referred to as latent roots and represent the amount of variance accounted for by a factor.

FACTOR: The smaller set of underlying composite dimensions of all variables in the data set. Factors are linear combinations of the original variables.

FACTOR LOADINGS: These are the correlation coefficients between the variables and the factors. The variables with the highest correlations provide the most meaning (in an interpretation sense) to the factor solution. The sum of the squared loadings for a given factor sum to the eigenvalue for that factor.

FACTOR MATRIX: This k variable by m factor matrix contains the factor loadings of all variables on each factor.

FACTOR ROTATION: Given a Cartesian coordinate system where the axes are the factors and the points are the variables, factor rotation is the process of holding the points constant and moving (rotating) the factor axes. The rotation is done in a manner so that the points are highly correlated with the axes and provide a more meaningful interpretation of the factor solution.

FACTOR SCORES: This is the score of each observation on the newly identified factors. This factor score is a linear combination of all of the original variables that were relevant in making the new factor.

GAMA OF ROTATION: A user input parameter that leads to different rotation schemes. Standard values of gama include 0 (for quartimax, quartimin, direct quartimin), .5 (for bi-quartimin), and 1.0 (for varimax and covarimin).

KAISER NORMALIZATION: A process by which each row of the initial factor loading matrix is normalized by dividing by the square root of hi, the row's commonality. This normalization has the effect of making the sum of squares for each row sum to 1.0. This transformation does not affect the varimax solution.

OBLIMIN: Also called simple structure and refers to the rotated factor loadings matrix. Simple structure is difficult to define in that it refers to the situation where most of the loadings on any specific factor are small and a few loadings are as large as possible.

OBLIQUE FACTOR SOLUTIONS: A computed factor solution where the extracted factors are not independent, but are correlated. In many situations, there is no arbitrary (or theoretical) reason why the factors should be independent of each other. The analysis is conducted to express the relationship between the factors that may or may not be orthogonal; rather than arbitrarily constraining the factor solution so that the factors are independent of each other.

ORTHOGONAL: Refers to mathematical independence of the factors. Operationally, orthogonal factor axes are at right angles to each other (90o).

ORTHOGONAL FACTOR SOLUTIONS: The directional cosines of the angle between the factors in the factor solution corresponds to the correlations between the factors. Orthogonality refers to no correlation and is synonymous to a 90o angle in a Cartesian coordinate system. Orthogonal factor solutions then extract the factors so that the factor axes are maintained at right angles. Thus each factor is independent of all other factors and the correlation between the factors is zero.

SQUARED FACTOR LOADINGS: Because loadings are the correlation between the variables and the factors, the squared factor loadings could be compared to R-Square in a regression analysis. The squared factor loadings indicate the percentage of the variance of the original variable is explained by the factor. For a given factor, the sum of these squared factor loadings is the eigenvalue or latent root associated with that factor.

TRACE: It is the Sum of Squares of the numbers on the diagonal of the correlation matrix used in the factor analysis. The trace is equal to the number of variables, based on the assumption that the variance in each variable is equal to 1. With the common correlation matrix, the trace is equal to the sum of the communalities on the diagonal of the reduced correlation matrix which is also equal to the amount of common variance for the variables being analyzed.

VARIMAX ROTATIONS: An orthogonal rotation of factors that redistributes the variance accounted within the pattern of factor loadings. Both the communalities and the total variance accounted for are the same before and after rotation. This procedure is the most commonly used to re-orient or clean up the loadings obtained in a principal components analysis.


[edit] AN EXAMPLE OF FACTOR ANALYSIS

Factor analysis may be run based on either a raw data set or a correlation matrix. Initial communality estimates may be squared multiple correlations, regression variances, maximum absolute row values, or they may be specified by the user. If requested, the program will iterate on the initial communality estimates. Multiple types of rotations are available, all based on the oblimin criterion. In the first, the factors are restricted to be non-orthogonal, which yields quartimax and varimax rotations (as well as other rotational solutions). In the second, the criterion is applied to the reference factor structure and the factors are allowed to be oblique which yields standard oblimin rotations. In the third, the factors are applied to primary factor loadings, allowing the factors to be oblique and yielding simple loading rotations.

[edit] Typical Results

Typical factor analysis output includes:

  1. Mean and Standard Deviation for the variables
  2. Variance-Covariance Matrix
  3. Correlation Matrix
  4. N Matrix
  5. Eigenvalues
  6. Cumulative proportion of total variance
  7. Proportion of Variance per Eigenvalue
  8. Factor Matrix before rotation
  9. Rotated Factor Matrix
  10. Factor Score Coefficients

[edit] Factor Analysis Sample Output

                    PC-MDS
                FACTOR ANALYSIS
 
 ANALYSIS TITLE      BMD08M TEST DATA                                                                  
 INPUT DATA FILE     A:FACTOR.DAT                                         
 OUTPUT PRINT FILE   A:FACTOR.PRN                                         
 NO. OF VARIABLES       5 
 DATA TREATED AS HAVING NO MISSING VALUES 
 
 DATA FOR RECORD:     1 
  .70E+01 .90E+01 .60E+01 .50E+01 .20E+01 
 
 DATA FOR RECORD:    20 
  .40E+01 .60E+01 .40E+01 .20E+01 .80E+01 
 
 VARIABLE        MEAN      STAND. DEV.    MINIMUM       MAXIMUM 
 V1             4.7500      2.53138       1.00000       9.00000 
 V2             5.4500      2.08945       2.00000       9.00000 
 V3             4.4500      2.28208        .00000       9.00000 
 V4             4.6500      2.32322       2.00000       9.00000 
 V5             4.6500      2.36810       1.00000       9.00000 
 
 CORRELATION MATRIX 
 V1           .10000E+01 
 V2           .42042E+00  .10000E+01 
 V3           .17538E+00  .61757E+00  .10000E+01 
 V4           .22597E+00 -.20438E+00 -.27647E+00  .10000E+01 
 V5          -.37534E+00  .20061E+00 -.12515E+00  .38792E+00  .10000E+01 
                    1           2           3           4           5 
 N-MATRIX 
 V1               20 
 V2               20      20 
 V3               20      20      20 
 V4               20      20      20      20 
 V5               20      20      20      20      20 
                  1       2       3       4       5 

                      FACTOR  ANALYSIS  SUMMARY  STATISTICS 
 
 NUMBER OF CASES                            20 
 NUMBER OF VARIABLES                         5 
 MAX. ITERATIONS FOR COMMUNALITIES           1 
 MAX. ITERATIONS FOR ROTATION               50 
 NUMBER OF FACTORS TO BE ROTATED             2 
 EIGENVALUE CUTOFF CONSTANT               1.000000 
 UPPER LIMIT ON CORRELATION COEFFICIENT    .95000 
 DIAGONAL ELEMENTS ARE UNALTERED 
 VARIMAX ROTATION IS PERFORMED 
 
 EIGENVALUES  
          2.08418     1.25547     1.04697      .36381      .24957 
 
  CUMULATIVE PROPORTION OF TOTAL VARIANCE 
           .41684      .66793      .87732      .95009     1.00000 
 
                  PROPORTION OF VARIANCE PER EIGENVALUE 
 VARIANCE PERCENT 
           .............................................................. 
           .                                                            . 
     .4168 .***********                                                 . 
           .***********                                                 . 
           .***********                                                 . 
           .***********                                                 . 
     .2779 .***********                                                 . 
           .***********                                                 . 
           .*********** ***********                                     . 
           .*********** *********** ***********                         . 
     .1389 .*********** *********** ***********                         . 
           .*********** *********** ***********                         . 
           .*********** *********** ***********                         . 
           .*********** *********** *********** ***********             . 
           .*********** *********** *********** *********** *********** . 
           .............................................................. 
 EIGENVALUE           0           0           0           0           0 
                      1           2           3           4           5 
   
 VARIABLE          ESTIMATED         FINAL 
                   COMMUNALITY       COMMUNALITY 
 V1                 1.000000          .817099 
 V2                 1.000000          .711402 
 V3                 1.000000          .560990 
 V4                 1.000000          .884644 
 V5                 1.000000          .365514 

  FACTOR MATRIX BEFORE ROTATION 
  VAR# VARIABLE NAME                 FACTOR 
                     1        2 
    1  V1          .55331   .71481 
    2  V2          .82906   .15512 
    3  V3          .74201  -.10205 
    4  V4         -.44063   .83096 
    5  V5         -.58819   .13983 
 ORTHOGONAL ROTATION 
 ITERATION   SIMPLICITY 
             CRITERION 
     0       -1.095068 
     1       -1.095877 
     2       -1.095877 
 FACTOR -      1   VARIANCE ACCOUNTED FOR: .4168 
 VARIABLE 
    2        V2              .82062 
    3        V3              .74606 
    5        V5             -.59424 
    1        V1              .51822 
    4        V4             -.48016 
 FACTOR -      2   VARIANCE ACCOUNTED FOR: .2511 
 VARIABLE 
    4        V4              .80876 
    1        V1              .74064 
    2        V2              .19489 
    5        V5              .11132 
    3        V3             -.06618 
 ROTATED FACTOR MATRIX: 
  VAR# VARIABLE NAME                 FACTOR 
                     1        2 
    1  V1          .51822   .74064 
    2  V2          .82062   .19489 
    3  V3          .74606  -.06618 
    4  V4         -.48016   .80876 
    5  V5         -.59424   .11132 
  FACTOR SCORE COEFFICIENTS 
  VAR# VARIABLE NAME                 FACTOR 
                     1        2 
    1  V1           .2377    .5815 
    2  V2           .3914    .1426 
    3  V3           .3595   -.0640 
    4  V4          -.2431    .6509 
    5  V5          -.2873    .0976 
 FACTOR ANALYSIS COMPLETE, NORMAL END OF PROGRAM 

[edit] FACTOR ANALYSIS Technical Appendix

Image:factor1.gif

Image:factor2.gif

Image:factor3.gif