Why use survey statistical analysis methods?
When it comes to survey data, collection is only half the picture. What you do with your results can make the difference between uninspiring top-line findings and deep, revelatory insights. Using data processing tools and techniques like statistical tests can help you discover:
- whether the trends you see in your data are meaningful or just happened by chance
- what your results mean in the context of other information you hold
- whether one factor affecting your business is more important than others
- what your next research question should be
- how to generate insights that lead to meaningful changes
There are several types of statistical analysis for surveys. In this article, we’ll explore some of the most common methods presently used, and provide links to more in-depth explainers from the Qualtrics team.
Benchmarking is a way of standardizing – leveling the playing field – so that your data and results are meaningful in context. It involves taking outside factors into account so that you can adjust the parameters of your research and have a more precise understanding of what’s happening.
Benchmarking techniques use weighting to adjust for variables that may affect overall results. For example, imagine you’re interested in the growth of crops over a season. Your benchmarking will take into account variables that have had an effect on crop growth, such as rainfall, hours of sunlight, any pests or diseases, type and frequency of fertilizer etc., so that you can adjust for anything unusual that might have happened, such as an unexpected plant disease outbreak on a single farm within your sample.
With benchmarks in place, you have a reference for what is “standard” in your area of interest, so that you can better identify and investigate variance from the norm.
The goal, as in so much of survey data analysis, is to make sure that your sample is representative, rather than skewed, and that any comparisons with other data are like-for-like.
Regression is a statistical technique used for working out the relationship between two (or more) variables.
To understand regressions, we need a quick terminology check:
- Independent variables are “standalone” phenomena (in the context of the study) that influence dependent variables
- Dependent variables are things that change as a result of their relationship to independent variables
A change in a dependent variable – let’s say, crop growth during August – depends on, and is associated with, a change in one (or more) independent variables – which in the crop example may be sunshine, rainfall and pollution levels.
- Linear regression uses a single independent variable to predict an outcome of the dependent variable.
- Multiple regression uses at least two independent variables. A multiple regression can be linear or non-linear.
The results from a linear regression analysis are shown as a graph with variables on the axes and a ‘regression curve’ that shows the relationships between them. Data is rarely directly proportional, so there’s usually some degree of curve rather than a straight line.
This is a useful test as you’re able to identify the precise impact of a change in your independent variable.
The T-test (aka Student’s T-test) is a tool for comparing two data groups which have different mean values. For example, do women and men have different mean heights? The T-test allows the user to interpret whether differences are statistically significant or merely coincidental.
The results of a T-test are expressed in terms of probability (p-value). If the p-value is below a certain threshold, usually 0.05, then you can be very confident that your two groups really are different and it wasn’t just a chance variation between your sample data.
Analysis of variance (ANOVA) test
Like the T-test, ANOVA (analysis of variance) is a way of testing the differences between groups to see if they’re statistically significant. However, ANOVA allows you to compare three or more groups rather than just two.
ANOVA is used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them, e.g. studying whether different types of advertisements get different consumer responses.
Go deeper: What is Analysis of Variance (ANOVA)?
Cluster analysis is a way of processing datasets by identifying how closely related the individual data points are. Using cluster analysis, you can identify whether there are defined groups (clusters) within a large pool of data, or if the data is quite evenly spread out.
Cluster analysis comes in a few different forms, depending on the type of data you have and what you’re looking to find out. It can be used in an exploratory way, such as discovering clusters in survey data around demographic trends or preferences, or to confirm and clarify existing hypotheses. It’s one of the more popular statistical techniques in market research, since it can be used to uncover market segments and customer groups.
Factor analysis is a way to reduce the complexity of your research findings by trading a large number of initial variables for a smaller number of deeper, underlying ones. In performing factor analysis, you uncover “hidden” factors that explain variance (difference from the average) in your findings.
Because it delves deep into the causality behind your data, it’s also a form of research in its own right, as it gives you access to drivers of results that can’t be directly measured.
Market researchers love to understand and predict why people make the complex choices they do. Conjoint analysis comes closest to doing this: it asks people to make trade-offs when making decisions, just as they do in the real world, then analyses the results to give the most popular outcome.
For example, an investor wants to open a new restaurant in a town. They think one of the following options might be the most profitable:
|Type of Restaurant||Gourmet Burger||Spanish Tapas||Thai|
|Average price per head||$20||$40||$60|
|Distance from town center||5 miles||2 miles||10 miles|
|What does customer’s partner feel?||It’s OK||It’s OK||Loves it!|
|Trade-offs||It’s cheap, fairly near home, partner is just OK with it||It’s a bit more expensive but very near home, partner is just OK with it||It’s expensive, quite far from home but partner loves it|
The investor commissions market research. The options are turned into a survey for the residents:
- Which type of restaurant do you prefer? (Gourmet burger/Spanish tapas/Thai
- What would you be prepared to spend per head? ($20, $40, $60)
- How far would you be willing to travel? (5km, 2km, 10km)
- Would your partner…? (Love it, be OK with it)
There are lots of possible combinations of answers – 54 in this case: (3 restaurant types) x (3 price levels) x (3 distances) x (2 partner preferences). Once the survey data is in, conjoint analysis software processes it to figure out how important each option is in driving customer decisions, which levels for each option are preferred, and by how much.
So, from conjoint analysis, the restaurant investor may discover that there’s a preference for an expensive Spanish tapas bar on the outskirts of town – something they may not have considered before.
Get more details: What is a conjoint analysis? Conjoint types and when to use them
Crosstab (cross-tabulation) is used in quantitative market research to analyze categorical data – that is, variables that are different and mutually exclusive, such as: ‘men’ and ‘women’, or ‘under 30’ and ‘over 30’.
Also known by names like contingency table, chi-square and data tabulation, it allows you to compare the relationship between two variables by presenting them in easy-to-understand tables.
A statistical method called chi-squared can be used to test whether the variables in a crosstab analysis are independent or not.
Text analysis and sentiment analysis
Analyzing human language is a relatively new form of data processing, and one that offers huge benefits in experience management. As part of the Stats iQ package, TextiQ from Qualtrics uses machine learning and natural language processing to parse and categorize data from text feedback, assigning positive, negative or neutral sentiment to customer messages and reviews.
With this data from text analysis in place, you can then employ statistical tools to analyze trends, make predictions and identify drivers of positive change.
The easy way to run statistical analysis
Our Stats iQ product can perform the most complicated statistical tests at the touch of a button using our online survey software, or data brought in from other sources. Turn your data into insights and actions with CoreXM and Stats iQ™. Powerful statistical analysis. No stats degree required.