About Relating Data
Relate explores the relationships between variables. When you select two variables and then select Relate, Stats iQ will choose the appropriate statistical test based on the structure of the data, run that test, then translate the results into plain English.
When you select three or more variables, Stats iQ will relate each variable to the one variable that has the key by it, then bring the strongest relationships to the top. You can select dozens of variables at a time, so you can sift through many relationships quickly.
The Key Variable
The first variable selected from the variable pane will be the key variable. The key variable serves two functions:
- If more than two variables are selected (as described above), each non-key variable will be related to the one key variable (e.g., if you select ten variables, the one key variable will be related to each of the other nine, resulting in nine separate relate cards).
- The key variable is the “output” variable by default. For example, if you select “Age” and “Location,” it’s possible that “Age” (input) impacts “Location” (output), but it wouldn’t make sense for “Location” to impact “Age”; in this case you’d put the key by “Location.” (In many analyses this distinction doesn’t matter, but the input and output variables can always be swapped after creating the card.) If you want to make the key variable the input variable instead of the output variable, select the small arrows on the right side of the Relate button.
Relating Numbers and Numbers Variables
If the variables have many overlapping points on the scatterplot, Stats iQ will instead show a “binned” scatterplot where darker rectangles indicate a greater clustering of results. A line of best fit is shown by Stats iQ when the data indicates that the line will be useful (specifically, when the data doesn’t have outliers that might throw off the line).
To see the statistical details of any “relate” analysis results, click Show statistical test results. When relating two numbers variables, Stats iQ calculates a p-value and (for effect size) either a Pearson’s r or a Spearman’s rho. For more details on how Stats iQ chooses the statistical test, visit the Statistical Test Assumptions and Technical Details page.
It’s possible that you’re less interested in whether the variables are correlated and more interested in which variable is higher on average. If the two variables are on similar scales, Stats iQ will provide an option at the top to switch from Correlation to Paired Difference, which allows you to compare averages.
Relating Numbers and Categories Variables
When you relate a numbers variable and a categories variable, Stats iQ runs a statistical test and creates a table displaying each category’s count, average, median, and distribution of the numbers variable.
For example, you may wish to determine if guests at a hotel with children or guests without children are more satisfied on average. In this case, the “Children Present” variable is categorical, and “Satisfaction” is numeric.
The output of this statistical test can be seen by clicking Show statistical test results on the card. When the categories variable has only two categories, Stats iQ performs a t-test or a ranked t-test. When it has more, Stats iQ runs an ANOVA or a ranked ANOVA, as well as a Games-Howell post hoc test. For more details on how Stats iQ chooses the statistical test, visit the Statistical Test Assumptions and Technical Details page.
Relating Categories and Categories Variables
Each column in the cross tab sums to 100%. In the example below, 69% of respondents in “USA” were “Returning” and 31% were “New.” You can select Row % to make the rows sum to 100%, Count to see the raw count in each cell, or All % to see the entire table sum to 100%. Alternatively, you can flip the rows with the columns entirely by selecting the ← at the top of the analysis result.
In the example below, since the columns sum to 100%, the question we are asking is, “What proportion of USA respondents were returning guests?” If we select Row % (or swap the columns and rows), we’re now asking “What proportion of returning guests were in the USA?” In this case, either of those questions could be useful to ask. Sometimes only one question will really be meaningful.
Green and red arrows within cells will indicate if the value of a cell is statistically higher or lower than you would expect if there was no relationship between the variables. If Col % is selected, the arrows compare that cell’s number to the other numbers in that row. More arrows correspond to a higher degree of statistical significance. Cells with high numbers in them appear darker than other cells.
The output of the statistical test can be seen by clicking Show statistical test results on the card. Stats iQ performs either a Fisher’s Exact Test or a Chi-Squared test when two categorical variables are related. Up to three arrows will be shown in a cell, depending on the p-value calculated from the adjusted residual of the cell. For more details on how Stats iQ chooses the statistical test, visit the Statistical Test Assumptions and Technical Details page.
In addition to the general crosstab, Stats iQ will also generate a Pairwise Comparison table, which compares the values of pairs of categories in a given row. For example, the crosstab below shows the proportion of clients who are returning visitors from various locations. The Pairwise Comparison table shows, for example, that the UK has a 6 percentage-point higher proportion of returning visitors than the USA. The green and red arrows on cells indicate statistically significant differences.
Relating Checkboxes and Numbers Variables
Stats iQ displays a table with two rows for each checkbox: one for if the box was checked and one for if it was not. For example, if one of the checkboxes represents whether or not a respondent used the pool, there will be a row for using (checked) and not using (unchecked) the pool, along with the average satisfaction scores of respondents who fall into either of those two groups.
This table, like most in Stats iQ, can be sorted. For example, you might want to sort by average or by whether the box was checked or not. Click the column header (e.g., Average) to sort the table by the values in that column.
Although the table will display statistical information such as median and average, there are no statistical tests performed in this situation. To run a separate analysis comparing the averages of those who used the pool versus those who did not:
Relating Checkboxes and Categories Variables
Depending on which variable had the key by it, one of the first two columns will contain the categories variable options and the other will contain the checkbox options. The “%” column will indicate the proportion of the first column group that selected the second column group.
In the example below, the first row indicates the following:
- There were 1663 respondents who are new customers.
- Of those 1663 respondents, 359 used the pool.
- That means 21.6% of the 1663 respondents used the pool.
- The red arrows in the last column indicate that this is a lower than typical proportion.
The arrows in the last column are calculated in the same way as in the cross tab for categorical variables, discussed previously.
Relating Numbers and Times Variables
When you relate a numbers variable and a times variable, Stats iQ will create a chart that shows how the numbers variable has varied over time. To change the bin size (from days to weeks, for example), click Bin Size above the chart.
In addition to the date bins, Stats iQ will display a line for a specific statistical value over time. The default value is the mean. Selecting a different option at the top of the chart (Median, Min, or Max) will change which value is represented as a line on the chart. Adjusting the slider below the graph will narrow the date range displayed.
The output of this statistical test can be seen by clicking Show statistical test results on the card. The statistical tests Stats iQ runs are the same that would be run if the times variable were a numbers variable. In particular, this means that Stats iQ will run a correlation between the variables.
Relating Times and Categories Variables
When you Relate a times variable and a categories variable, Stats iQ will create a chart that shows how the counts of those categories have changed over time. To change the bin size (from days to weeks, for example), click Bin Size above the chart.
For this type of card, you will have the option to select the type of chart that is displayed. The chart type is changed when a different option (Bar, Line, or Area) is selected above the chart. The chart will display data as a Percent or Count depending on which option is selected at the top of the chart. Percent is particularly useful for seeing how the distribution of groups has changed over time. No statistical tests are run for this type of card.
Statistical Tests in Stats iQ
Stats iQ chooses statistical tests based on the variable types and structure of the columns being analyzed. For reference, this is a full list of the non-regression statistical tests and effect size measures in Stats iQ:
- T-test (2 Categories vs. Numbers)
- ANOVA (3+ Categories vs. Numbers)
- Games-Howell post hoc tests (3+ Categories vs. Numbers)
- Cohen’s f
- Correlation (Numbers vs. Numbers)
- Pearson correlation
- Spearman correlation
- Point Biserial correlation
- Cohen’s d
- Paired t-test (Numbers vs. Numbers)
- Fisher’s Exact Test (2 Categories vs. 2 Categories)
- Chi-squared (3+ Categories vs. Categories)
- Cramer’s V
- Z-test (Categories vs. Categories)
- Time-series analysis
- Difference in differences (DID, DD)
Choosing Statistical Tests
Stats iQ will pick the correct statistical test for you, given its understanding of the data (e.g., whether a variable is a numbers variable or categories variable). You can change the variable type to trigger a different result, however.
For example, you could relate a 1/0 to a 1-7 scale. If the 1/0 is considered to be categorical, the result is a t-test. If it’s considered to be numeric, the result is a correlation (the results of those two analyses will be very similar).
Stats iQ will run a “ranked” relationship if numeric data isn’t normally distributed or has outliers. If you’d rather see the “unranked” relationship (or vice versa), that option is available in the statistical test results. For more details on ranked tests, visit the Statistical Test Assumptions and Technical Details page.
Multiple Comparisons Problem
The Multiple Comparisons problem can occur if you use the “relate” analysis with a large number of non-key variables selected. In this analysis, you’re likely to see about 5 of the results show up as statistically significant through pure luck, and not necessarily a meaningful relationship. This is a necessary consequence of the way statistical analysis works.
In Stats iQ, if you run many analyses at once and see results where the p-value is narrowly significant (e.g., 0.03 instead of 0.00004), this is a good indication that these correlations are not necessarily significant.
Translating Statistics into Sentences
Stats iQ produces plain English sentences to explain the results of the Relate analysis.
If the p-value is not below the threshold of statistical significance (the default for this in Stats iQ is 0.05), the sentences will explain that there is not a statistically significant relationship.
If the p-value is below the threshold, Stats iQ will then look at the effect size. Depending on the effect size, Stats iQ will add words to the sentence like “weak” or “strong” to characterize the relationship. More information on how Effect Size and p-value are interpreted can be found by clicking the information (i) button under Show statistical test results.
The below table outlines how we would describe variable relationships for t-tests based on Effect Size.
|Effect Size||Effect Size Interpretation||Stats iQ Language|
|Below 0.2||Trivial or no effect||There is no statistically significant relationship between the variables.|
|Between 0.2 and 0.5||Small effect||Variables are statistically related. We wouldn’t use an extra adjective to characterize their relationship.|
|Between 0.5 and 0.8||Medium effect||Variables are statistically related. We wouldn’t use an extra adjective to characterize their relationship.|
|Above 0.8||Large effect||Variables are “strongly” related.|
Depending on the type of statistical test used, the Effect Size thresholds will be slightly different. However, the same general pattern applies.