The Use and Interpretation of Cross Analysis

From QualtricsWiki

Jump to: navigation, search

[edit] The Use and Interpretation of the Cross-Tabulation

Cross-Tabulation is one of the analytical tools that is a main-stay of the market research industry. One estimate is that single variable frequency analysis and cross-tabulation analysis account for more than 90% of all research analyses.

Cross tabulation analysis, also known as contingency table analysis is most often used to analyze categorical (nominal measurement scale) data. A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables.

Cross-Tabulation analysis has its own unique language, using terms such as "banners", "stubs", "Chi-Square Statistic," and "Expected Values" when describing the tables.

A typical cross-tabulation table comparing the two hypothetical variables "Favorite Color" with "Favorite Flavor" is shown below. The cells of the table report the frequency counts and percentages for the number of respondents in each cell.

Cross Tabulation
Frequency/Percent
1. What is your favorite color
Red Blue Green Row Totals
2. What is your favorite flavor
Cherry 11 33 7 51
Row Percent 21.57% 64.71% 13.73% 34.93%
Grape 23 14 9 46
Row Percent 50.00% 30.43% 19.57% 31.51%
Rasberry 22 13 14 49
Row Percent 44.90% 26.57% 28.57% 33.56%
Column Totals 56 60 30 146
Column Percentage 38.36% 41.10% 20.55% 100%
X2: 19.35
Prob:0.000

In the above table, the text legend for the top axis is referred to as the "Banner" and the row is the "Stub". The online analysis tools allow you to create and analyze multiple tables in a side by side or sequential format. Tabulation Professionals call the column variables in these multiple tables "Banners" and row variables "Stubs".

Creating Simple and Advanced Cross-Tabulation Tables:

Banners (Columns): One or two banner variables may be selected. If two variables are selected they will appear side by side.

Stubs (Rows): Multiple variables may be selected. The row variables will appear consecutively one after another, each cross tabulated with the banner variables you selected.

Multiple Item Selection: Multiple Banner or Stub variables are selected by pressing "CTRL" combined with a mouse "Click"

Image:ct1.jpg



Other Use Guidelines:
1. Cross tabulations are appropriate only for Categorical (Ordinal) Data such as multiple choice questions. Cross-tabulation is not appropriate (and cannot be used) for analysis of open ended text questions, numerical input questions (open ended text, rank order or constant sum).

2. Subgroup Criteria is not available with Cross tabulation. Only simple cross tabulation is active.

[edit] Cross-Tabulation With Chi-Square Analysis

The Chi-square statistic is the primary statistic used for computing the statistical significance of the cross-tabulation table. Chi-square is used to test for statistical independence; that is, to see if the two variables are independent.

If the variables are independent (have no relationship), then the results of the statistical test will be "non-significant" and we "are not able to reject the null hypothesis," meaning that we believe there is no relationship between the variables.

If the variables are found to be related, then the results of the statistical test will be "significant" and we "are able to reject the null hypothesis", meaning that we can state that there is some relationship between the variables.

We use the chi-square statistic as the means of testing, or determining if the relationship is "statistically significant."

The chi-square statistic, along with the associated probability of chance observation, may be computed for any table. If the variables are related (i.e., the observed table relationships would occur with very low probability, say only 5%) then we say that the results are "statistically significant" at the ".05 or 5% level". This means that the variables have a low chance of being independent. Depending on the cost of making mistakes, the researcher may apply more stringent criteria for declaring "significance" such as .01 or .005.

Students of statistics will recall that the probability values (.05 or .01) reflect the researcher's willingness to accept a type I error, or the probability of rejecting a true null hypothesis (meaning that we thought there was a relationship between the variables when there really wasn't). Furthermore these probabilities are cumulative, meaning that if 20 tables are tested, the researcher can be almost assured that one of the tables is incorrectly found to have a relationship (20 x .05 = 100% chance).

[edit] Computation of the Chi-Square Statistic for Cross-Tabulation Tables

The chi-square statistic is computed by first computing a chi-square value for each individual cell of the table and then summing them up to form a total value for the table. The chi-square value for the cell is computed as:

(observed value - expected value)^2 / (expected value) Note that ^2 is notation for exponent 2 (squared)

Chi-Square…
Computations in Gray

Favorite Color
Red Blue Green Row Totals
Cherry 11.0000 33.0000 7.0000 51.0000
Row Percent 0.2157 0.6471 0.1373 0.3493
Row %
Expected 19.5616 20.9589 10.4795
Computation of Cell Expected Value Column total percent (.3836) * 51 Column total percent (.4110) * 51 Column total percent (.2055) * 51
Cell Chi-Square 3.7472 6.9177 1.1553
Computation Cell x2 = (11-19.56)^2/19.56 Cell x2 = (33-20.958)^2/20.9586 Cell x2 = (7-10.479)^2/10.479
Grape 23.0000 14.0000 9.0000 46.0000
Row Percent 0.5000 0.3043 0.1957 0.3151
Row %
Expected 17.6438 18.9041 9.4521
Column total percent (.3836) * 46 Column total percent (.4110) *46 Column total percent (.2055) *46
Cell Chi-Square 1.6260 1.2722 0.0216
Computation Cell x2 = (23-17.64)^2/17.64 Cell x2 = (14-18.09)^2/18.09 Cell x2 = (9-9.45)^2/9.45
Raspberry 22.0000 13.0000 14.0000 49.0000
Row Percent 0.4490 0.2653 0.2857 0.3356
Row %
Expected 18.7945 20.1370 10.0685
Column total percent (.3836) * 46 Column total percent (.4110) *46 Column total percent (.2055) *46
Cell Chi-Square 0.5467 2.5295 1.5352
Computation Cell x2 = (22-18.79)^2/18.79 Cell x2 = (13-20.13)^2/20.13 Cell x2 = (14-10.068)^2/10.068
Column Total 56.000 60.000 30.000 146.000
Total
Frequency
Column Total % 0.3836 0.4110 0.2055 1.000
Total %
Chi-Square = 19.3514
=Sum of Cell Chi-Square Values
Degrees of Freedom 4

=(#Rows-1) * (#Columns-1) = (3-1)*(3-1) = 4

Chi-Square Probability of Independence 0.00067
The prob.of 19.3514 and 4 df. is not computed, but can be looked up from a chi-square probability distribution table in a research textbook

In this example table, we observe that the chi-square value for the table is 19.3514, which occurs by chance less than one time in 1000. We therefore reject the null hypothesis of no difference and conclude that there must be a relationship between the variables.

We can observe the relationship in two places in the table. The most obvious is in the chi-square value computed for each cell. We observe that the cells "blue and cherry", "red and cherry" and "blue and raspberry" were those where the number of expected respondents were greater (or less) than expected. We further note that when we examine the expected and observed frequencies, the "red and cherry" and "blue and raspberry" frequencies were fewer than expected, while "blue and cherry" had more than expected.

Because the cell chi-square and the expected values are often not displayed, these same relationships can be observed by comparing the column total percent to the cell percent (of the row total). In cell "Blue and Cherry" we would compare 41.11% with 64.71% and observe that more respondents preferred "Blue and Cherry" than expected.

Caution is urged when interpreting relationships found in any statistical analysis. We often desire to "explain" or conclude "causality" from analyses and data not designed or that do not have the power to support such conclusions. In the current table we observe that "Blue and Cherry" was the most frequently observed combination of color and flavor preference. However we must be careful in concluding that a "Blue Cherry" drink would be a success... or that color preference may "cause" flavor preference. Blue and Cherry are the most preferred Colors and Flavors, but are most likely totally independent taste and flavor concepts that have no other relationship.