ANOVA

From QualtricsWiki

Jump to: navigation, search

[edit] One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

[edit] REQUIREMENTS

ONEWAY ANOVA tests the equality of group means for a single specified variable. For example, The F ratio tests the statistical significance between means.

In analysis of variance, the main research question is whether the sample means are from different populations. The assumptions upon which the tests and estimation procedures of the analysis of variance are based on are as follows: a) What ever be the technique of data collection, the observations within each sampled population are normally distributed. b) The sampled population has a common variance of s2.

[edit] Mathematical Formulations:

A) The Sum of Squares: Let there be k populations, with population means µ1, µ2, µ3 ....... µk, based on independent random samples of n1, n2, n3 ....... nk observations, selected from populations 1,2,3 .....,k, respectively. Then the Total Sum of Squares is the sum of squares of deviation of all n ( n = n1 +n2 +n3 + ........ + nk) x values about their overall mean i.e.
TOTAL SS = SSx = Σ (xi - x)2

The Total Sum of Squares can be broken down to two components that measure the source of variation. They are:

i) Sum of Squares for Treatment (SST):
Image:Anova1.gif

where:
Ti = Total of all observations receiving the treatment i (or of the ith population)
ni = Number of observations receiving the treatment i (or of the ith population)
CM= Correction for the mean = T2/n
T = Total of all observations = ( T1 + T2 + T3 + ....... + Tk )
n = Total number of Observations = ( n1 + n2 + n3 + ....... + nk )


ii) Sum of Squares for Error (SSE)
Image:Anova2.gif

SSE is usually computed in a simplified way from the equation;
SSERROR = SSTOTAL - SSTREATMENT


B) The Degrees of Freedom:
The degrees of freedom for the Total Sum of Squares is always (n - 1); where n = Total number of observations in all samples = ( n1 + n2 + n3 + ....... + nk )
The degrees of freedom of the Model (Treatment) is always (k - 1); where k = Total number of populations being analyzed.
The degrees of freedom of the Error is always (n - k).
The following relationship always holds:

D.F.(Treatment) + D.F.(ERROR) = (k-1) + (n-k) = (n-1) = D.F.(TOTAL SS)

C) The Mean Square:
The mean square gives an estimate of the s² based on the variation among the sample means (corresponding to the model) and the variation within the samples (corresponding to the error). These estimates are calculated by dividing the sum of squares by the corresponding degrees of freedom. Thus,

  1. The Mean Square for Treatment (Model) = MST = (SST)/(k-1)
  2. The Mean Square of the Error = MSE = (SSE)/(n-k)

(The MSE is a pooled estimate of s2 based on the sum of squares of deviations of the x-values about their respective sample means and is also denoted by s2.)

D) The F Statistic:
The F statistic is used for comparing the estimate of s2 (MS(Treatment)) and the s2 (MS(Error)) and is given by F = MS(Treatment)/MS(Error).

E) The ANALYSIS :
The ANOVA is done with the Ho: μ1 = μ2 = μ3 = .....= μk
Next, using the tables, the F-value with degrees of freedom v1 (v1 = D.F. of the numerator i.e. of MS(Treatment) = k-1) and v2 (v2 = D.F. of the denominator i.e. of MS(Error) = n-k), and for the significance level used in the analysis, is obtained.

This F-value is compared with the F statistic computed.
If the F-value obtained is greater than or equal to the F-Statistic Computed; then we say that THERE IS INSUFFICIENT EVIDENCE TO REJECT THE NULL HYPOTHESIS AT THE GIVEN LEVEL OF SIGNIFICANCE.

But, if the F-value obtained is less than the F-Statistic Computed; then we say that THERE IS SUFFICIENT EVIDENCE TO REJECT THE NULL HYPOTHESIS AT THE GIVEN LEVEL OF SIGNIFICANCE and that leads to the conclusion that at least one of the population means (μi) is different from the others.

The observed significance level is the significance level for which the F-value obtained from the table, corresponding to degrees of freedom v1 and v2, is equal to the F statistic computed. Another way of testing the null hypothesis is by using this observed significance level. If this significance level is less than or equal to the significance level set for the test, then the null hypothesis is rejected.

+------------------------------------+
¦Example One: University Selection   ¦
+------------------------------------+

The table below gives the number of students graduating in six areas of study 
at four Universities.

+--------------------------------------------------------------+
¦              ¦                                   ¦           ¦
¦              ¦    IIT      BYU      CSU      VPI ¦ TOTAL     ¦
¦--------------+-----------------------------------+-----------¦
¦   SCIENCE    ¦    597      280      245      339 ¦   1461    ¦
¦   ACCOUNTING ¦    768      260      240      275 ¦   1543    ¦
¦   BUSINESS   ¦    776      284      257      304 ¦   1621    ¦
¦   LAW        ¦    739      334      262      317 ¦   1652    ¦
¦   HUMANITIES ¦    562      338      250      335 ¦   1485    ¦
¦   ENGINEERING¦    696      315      330      350 ¦   1691    ¦
¦--------------+-----------------------------------+-----------¦
¦   Sum        ¦   4138     1811     1584     1920 ¦   9453    ¦
¦   Average    ¦    690      302      264      320 ¦   1576    ¦
¦   Sum of Sq  ¦2894790   551681   423718   618176 ¦14936821   ¦
+--------------------------------------------------------------+

n1 = n2 = n3 = n4 = 6 ;
k  = 4 ; 
n  = (n1 + n2 + n3 + n4) =  24                                        
 
CM (Correction for the Mean) = T²/N = 94532 / 24 = 3,723,300

Total Sum of Sq. Deviation (TSS)= SSx 
= Total Sum of Square   CM 
= 1,49,36,821 - 3,723,300 
= 11,213,520 

Sum of Squares for the TREATMENT = SST 
    4
=(S Ti²)/n   CM  
   i=1  
= (4138)² + (1811)²
= (4,433,036 3,723,300) 
= 709,736 

Sum of Squares for the ERROR = SSERROR
= SSTOTAL   SSTREATMENT 
= 112,13,520   709,736 
= 105,03,784                                               

Mean Square for the TREATMENT = MS(Treatment)
= SST/(k 1) 
= 236,578.8
                     
Mean Square for the ERROR = MS(Error) = SSE/(n k) = 525,189.2                   
                
F statistic = (MS(Treatment)/MS(Error)) =  0.450463                 
The ANOVA Table is given below:
+----------------------------------------------------+
¦   Source    d.f.     SS           MS          F    ¦
¦   Treatment  3    709736.4   236578.8   0.450463   ¦
¦   Error     20    10503784   525189.2              ¦
¦   Total     23    11213520                         ¦
+----------------------------------------------------+


+---------------------------------------+
¦Example Two: HOSPITAL STAFF EVALUATION ¦
+---------------------------------------+

Using the standard tables, for a=0.05, v1(d.f.1)=3, v2(d.f.2)=20; the value of the 
F-statistic is 2.38.  The Computed F-statistic is much less than the F-statistic from
the tables.  We can say that there insufficient evidence to conclude that there is a 
difference in the mean number of students graduating from the four universities.
 
PATIENT SURVEY, FOUR LOCAL HOSPITALS                                                    
 VARIABLE     V18        COMPETENCE OF NURSING STAFF              
 
  GROUP   COUNT       MEAN   STD. DEV.   STD. ERR. 95 PCT CONF INT FOR MEAN 
     1     107        1.009      .096        .009      .991  TO     1.028 
     2      46        1.022      .146        .022      .979  TO     1.065 
     3      11        1.000      .000        .000     1.000  TO     1.000 
     4       2        1.000      .000        .000     1.000  TO     1.000 
  TOTAL    166        1.012      .109        .008      .995  TO     1.029 
 
 
  GROUP   MINIMUM   MAXIMUM 
     1     1.00        2.00 
     2     1.00        2.00 
     3     1.00        1.00 
     4     1.00        1.00  

                                                                                 
     VARIABLE    1     V18          COMPETENCE OF NURSING STAFF              
  BY VARIABLE    5     V5           PERCENT HOSPITAL COVERAGE                
 

                              ANALYSIS OF VARIANCE 
 
                                SUM OF       MEAN         F            F 
     SOURCE           DF1       SQUARES     SQUARE2    RATIO 3       PROB. 4
  BETWEEN GROUPS        3        .007        .002           .192      .902 
  WITHIN  GROUPS      162       1.969        .012 
  TOTAL               165       1.976 
 
 
  TESTS FOR HOMOGENIETY OF VARIANCES 
        COCHRANS C = MAX. VARIANCE/SUM VARIANCES =        .697 
        MAXIMUM VARIANCE / MINIMUM VARIANCE      =  999999.000 

 ******************************************************************************
  1. The Degree of Freedom for the Regression Model, also called the explained model, is given by k, where k = number of independent variables in the regression equation. For the Residual, the error unexplained by the regression model, the Degree of Freedom is given by (n-k-1), where n = number of counts of the independent variable in the data set.
  2. Mean Square = (Sum of Squares)/(DF)
  3. F Ratio = (Mean Square of the Regression)/(Mean Square of the Residual)
  4. F-Prob = Level of significance corresponding to the F Value
 PATIENT SURVEY, LOCAL HOSPITAL                                                    
 VARIABLE     V19        PROMPTNESS OF RESPONSE                   
 
 
  GROUP   COUNT       MEAN   STD. DEV.   STD. ERR. 95 PCT CONF INT FOR MEAN 
     1     107        1.000      .193        .019      .963  TO     1.037 
     2      48        1.125     1.166        .168      .788  TO     1.462 
     3      11        1.818     2.289        .690      .438  TO     3.199 
     4       2        1.000      .000        .000     1.000  TO     1.000 
  TOTAL    168        1.089      .892        .069      .952  TO     1.227 
 
 
  GROUP   MINIMUM   MAXIMUM 
     1      .00        2.00 
     2      .00        9.00 
     3     1.00        9.00 
     4     1.00        1.00  

                                                                              

     VARIABLE    2     V19          PROMPTNESS OF RESPONSE                   
  BY VARIABLE    5     V5           PERCENT HOSPITAL COVERAGE                
 
                              ANALYSIS OF VARIANCE 
 
                                SUM OF       MEAN         F            F 
     SOURCE           D.F.     SQUARES     SQUARES         RATIO      PROB. 
  BETWEEN GROUPS        3       6.774       2.258          2.919      .036 
  WITHIN  GROUPS      164     126.886        .774 
  TOTAL               167     133.661 
 
 
 
  TESTS FOR HOMOGENIETY OF VARIANCES 
        COCHRANS C = MAX. VARIANCE/SUM VARIANCES =        .790 
        MAXIMUM VARIANCE / MINIMUM VARIANCE      =  999999.000 
 
  ****************************************************************************** 
 
 PATIENT SURVEY, LOCAL HOSPITAL                                                    
 VARIABLE     V20        FRIENDLINESS OF STAFF                    
 
 
  GROUP   COUNT       MEAN   STD. DEV.   STD. ERR. 95 PCT CONF INT FOR MEAN 
     1     107        1.019      .135        .013      .993  TO     1.045 
     2      48        1.167     1.161        .168      .832  TO     1.502 
     3      11        1.000      .000        .000     1.000  TO     1.000 
     4       2        1.000      .000        .000     1.000  TO     1.000 
  TOTAL    168        1.060      .633        .049      .962  TO     1.157 
 
 
  GROUP   MINIMUM   MAXIMUM 
     1     1.00        2.00 
     2      .00        9.00 
     3     1.00        1.00 
     4     1.00        1.00 
                                                                                
     VARIABLE    3     V20          FRIENDLINESS OF STAFF                    
  BY VARIABLE    5     V5           PERCENT HOSPITAL COVERAGE                
 
                              ANALYSIS OF VARIANCE 

                                 SUM OF       MEAN         F            F 
     SOURCE           D.F.     SQUARES     SQUARES         RATIO      PROB. 
  BETWEEN GROUPS        3        .775        .258           .636      .593 
  WITHIN  GROUPS      164      66.629        .406 
  TOTAL               167      67.405 
 
 
  TESTS FOR HOMOGENIETY OF VARIANCES 
        COCHRANS C = MAX. VARIANCE/SUM VARIANCES =        .987 
        MAXIMUM VARIANCE / MINIMUM VARIANCE      =  999999.000 

 ******************************************************************************
 
PATIENT SURVEY, LOCAL HOSPITAL                                                    
 VARIABLE     V21        NURSE EXPLAINING                         
 
  GROUP   COUNT       MEAN   STD. DEV.   STD. ERR. 95 PCT CONF INT FOR MEAN 
     1     107        1.047      .286        .028      .991  TO     1.102 
     2      48        1.125     1.166        .168      .788  TO     1.462 
     3      11        1.091      .287        .087      .918  TO     1.264 
     4       2        1.000      .000        .000     1.000  TO     1.000 
  TOTAL    168        1.071      .669        .052      .968  TO     1.175 
 
  GROUP   MINIMUM   MAXIMUM 
     1      .00        2.00 
     2      .00        9.00 
     3     1.00        2.00 
     4     1.00        1.00 
                                                                                
     VARIABLE    4     V21          NURSE EXPLAINING                         
  BY VARIABLE    5     V5           PERCENT HOSPITAL COVERAGE                
 
                              ANALYSIS OF VARIANCE 
 
                                SUM OF       MEAN         F            F 
     SOURCE           D.F.     SQUARES     SQUARES         RATIO      PROB. 
  BETWEEN GROUPS        3        .217        .072           .159      .924 
  WITHIN  GROUPS      164      74.925        .457 
  TOTAL               167      75.143 
 
  TESTS FOR HOMOGENIETY OF VARIANCES 
        COCHRANS C = MAX. VARIANCE/SUM VARIANCES =        .892 
        MAXIMUM VARIANCE / MINIMUM VARIANCE      =  999999.000
 
 ******************************************************************************  
  END OF ONEWAY ANALYSIS OF VARIANCE