The Use and Interpretation of Cross Analysis
From QualtricsWiki
[edit] The Use and Interpretation of the Cross-Tabulation
Cross-Tabulation is one of the analytical tools that is a main-stay of the market research industry. One estimate is that single variable frequency analysis and cross-tabulation analysis account for more than 90% of all research analyses.
Cross tabulation analysis, also known as contingency table analysis is most often used to analyze categorical (nominal measurement scale) data. A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables.
Cross-Tabulation analysis has its own unique language, using terms such as "banners", "stubs", "Chi-Square Statistic," and "Expected Values" when describing the tables.
A typical cross-tabulation table comparing the two hypothetical variables "Favorite Color" with "Favorite Flavor" is shown below. The cells of the table report the frequency counts and percentages for the number of respondents in each cell.
| Cross Tabulation Frequency/Percent | 1. What is your favorite color | ||||
| Red | Blue | Green | Row Totals | ||
| 2. What is your favorite flavor | |||||
| Cherry | 11 | 33 | 7 | 51 | |
| Row Percent | 21.57% | 64.71% | 13.73% | 34.93% | |
| Grape | 23 | 14 | 9 | 46 | |
| Row Percent | 50.00% | 30.43% | 19.57% | 31.51% | |
| Rasberry | 22 | 13 | 14 | 49 | |
| Row Percent | 44.90% | 26.57% | 28.57% | 33.56% | |
| Column Totals | 56 | 60 | 30 | 146 | |
| Column Percentage | 38.36% | 41.10% | 20.55% | 100% X2: 19.35 Prob:0.000 | |
In the above table, the text legend for the top axis is referred to as the "Banner" and the row is the "Stub". The online analysis tools allow you to create and analyze multiple tables in a side by side or sequential format. Tabulation Professionals call the column variables in these multiple tables "Banners" and row variables "Stubs".
[edit] Cross-Tabulation With Chi-Square Analysis
The Chi-square statistic is the primary statistic used for computing the statistical significance of the cross-tabulation table. Chi-square is used to test for statistical independence; that is, to see if the two variables are independent.
If the variables are independent (have no relationship), then the results of the statistical test will be "non-significant" and we "are not able to reject the null hypothesis," meaning that we believe there is no relationship between the variables.
If the variables are found to be related, then the results of the statistical test will be "significant" and we "are able to reject the null hypothesis", meaning that we can state that there is some relationship between the variables.
We use the chi-square statistic as the means of testing, or determining if the relationship is "statistically significant."
The chi-square statistic, along with the associated probability of chance observation, may be computed for any table. If the variables are related (i.e., the observed table relationships would occur with very low probability, say only 5%) then we say that the results are "statistically significant" at the ".05 or 5% level". This means that the variables have a low chance of being independent. Depending on the cost of making mistakes, the researcher may apply more stringent criteria for declaring "significance" such as .01 or .005.
Students of statistics will recall that the probability values (.05 or .01) reflect the researcher's willingness to accept a type I error, or the probability of rejecting a true null hypothesis (meaning that we thought there was a relationship between the variables when there really wasn't). Furthermore these probabilities are cumulative, meaning that if 20 tables are tested, the researcher can be almost assured that one of the tables is incorrectly found to have a relationship (20 x .05 = 100% chance).
[edit] Computation of the Chi-Square Statistic for Cross-Tabulation Tables
The chi-square statistic is computed by first computing a chi-square value for each individual cell of the table and then summing them up to form a total value for the table. The chi-square value for the cell is computed as:
(observed value - expected value)^2 / (expected value) Note that ^2 is notation for exponent 2 (squared)
Chi-Square…
Computations in Gray
| Favorite Color | |||||
| Red | Blue | Green | Row Totals | ||
| Cherry | 11.0000 | 33.0000 | 7.0000 | 51.0000 | |
| Row Percent | 0.2157 | 0.6471 | 0.1373 | 0.3493 |
|
| Expected | 19.5616 | 20.9589 | 10.4795 | ||
| Computation of Cell Expected Value | Column total percent (.3836) * 51 | Column total percent (.4110) * 51 | Column total percent (.2055) * 51 | ||
| Cell Chi-Square | 3.7472 | 6.9177 | 1.1553 | ||
| Computation | Cell x2 = (11-19.56)^2/19.56 | Cell x2 = (33-20.958)^2/20.9586 | Cell x2 = (7-10.479)^2/10.479 | ||
| Grape | 23.0000 | 14.0000 | 9.0000 | 46.0000 | |
| Row Percent | 0.5000 | 0.3043 | 0.1957 | 0.3151 | |
| Expected | 17.6438 | 18.9041 | 9.4521 | ||
| Column total percent (.3836) * 46 | Column total percent (.4110) *46 | Column total percent (.2055) *46 | |||
| Cell Chi-Square | 1.6260 | 1.2722 | 0.0216 | ||
| Computation | Cell x2 = (23-17.64)^2/17.64 | Cell x2 = (14-18.09)^2/18.09 | Cell x2 = (9-9.45)^2/9.45 | ||
| Raspberry | 22.0000 | 13.0000 | 14.0000 | 49.0000 | |
| Row Percent | 0.4490 | 0.2653 | 0.2857 | 0.3356 | |
| Expected | 18.7945 | 20.1370 | 10.0685 | ||
| Column total percent (.3836) * 46 | Column total percent (.4110) *46 | Column total percent (.2055) *46 | |||
| Cell Chi-Square | 0.5467 | 2.5295 | 1.5352 | ||
| Computation | Cell x2 = (22-18.79)^2/18.79 | Cell x2 = (13-20.13)^2/20.13 | Cell x2 = (14-10.068)^2/10.068 | ||
| Column Total | 56.000 | 60.000 | 30.000 | 146.000 | Frequency |
| Column Total % | 0.3836 | 0.4110 | 0.2055 | 1.000 | |
| Chi-Square = | 19.3514 |
=Sum of Cell Chi-Square Values
| |||
| Degrees of Freedom | 4 |
=(#Rows-1) * (#Columns-1) = (3-1)*(3-1) = 4 | |||
| Chi-Square Probability of Independence | 0.00067 |
The prob.of 19.3514 and 4 df. is not computed, but can be looked up from a chi-square probability distribution table in a research textbook
| |||
In this example table, we observe that the chi-square value for the table is 19.3514, which occurs by chance less than one time in 1000. We therefore reject the null hypothesis of no difference and conclude that there must be a relationship between the variables.
We can observe the relationship in two places in the table. The most obvious is in the chi-square value computed for each cell. We observe that the cells "blue and cherry", "red and cherry" and "blue and raspberry" were those where the number of expected respondents were greater (or less) than expected. We further note that when we examine the expected and observed frequencies, the "red and cherry" and "blue and raspberry" frequencies were fewer than expected, while "blue and cherry" had more than expected.
Because the cell chi-square and the expected values are often not displayed, these same relationships can be observed by comparing the column total percent to the cell percent (of the row total). In cell "Blue and Cherry" we would compare 41.11% with 64.71% and observe that more respondents preferred "Blue and Cherry" than expected.
Caution is urged when interpreting relationships found in any statistical analysis. We often desire to "explain" or conclude "causality" from analyses and data not designed or that do not have the power to support such conclusions. In the current table we observe that "Blue and Cherry" was the most frequently observed combination of color and flavor preference. However we must be careful in concluding that a "Blue Cherry" drink would be a success... or that color preference may "cause" flavor preference. Blue and Cherry are the most preferred Colors and Flavors, but are most likely totally independent taste and flavor concepts that have no other relationship.


