What do we mean by survey errors?
In everyday language, error just means a mistake. But when it comes to data, the definition is a little broader. As well as the human mistakes in surveying, when we talk about error we’re talking about degrees of accuracy and certainty in our data.
Errors are sources of uncertainty, both in the estimates in the data and the conclusions we draw from that data. The goal of a survey is usually to make inferences about a larger population of interest – using a sample to research a population. Evaluations of survey data quality typically reflect how successful the project has been in doing that.
Survey errors reduce, but don’t necessarily eliminate, our ability to accurately make inferences about the larger population from our sample. As a result, understanding survey errors is key to understanding survey data quality.
Increased error typically results in larger confidence intervals (reduced certainty) around the estimates in the data and inferences made about the population of interest. If these confidence intervals grow too large, the quality of the data and inferences can be degraded to the point of making them uninformative – and meaning your time and energy conducting a survey is wasted.
Where do errors in surveying come from?
We can sort survey error into two types – sampling and non-sampling error.
Sampling error is a natural effect of using a sample to study a larger population. It’s the extent to which the sample differs from the population. Some degree of sampling error is inevitable, but it doesn’t mean survey research is not worth doing. It just means there will be a margin of error in your results.
Non-sampling error is where there’s a problem with how the survey is set up and carried out. It includes errors in questionnaire design like question order bias, where the wording or theme of previous questions affect how a respondent answers later in the survey. It also covers problems like response bias, where the sample is balanced, but a group of people who answer your survey are disproportionately represented. For example, if you surveyed equal numbers of working and non-working parents, you might get more responses from the non-working ones because they’re not quite as busy.
The Total Survey Error model
The Total Survey Error (TSE) model is a helpful framework for understanding sources of error and their effects on survey estimates and inferences.
In this framework, the mean square error (MSE) is used to sum all of the variable errors and biases for a particular survey. These errors are specific to a survey estimate or statistic, and in practice the MSE is rarely measured comprehensively and precisely, but the goal is to estimate the MSE as accurately as possible.
Using the TSE framework, survey errors can be classified in three broad categories – errors of non-observation, errors of observation, and errors of processing
The list in each category of error is not exhaustive as there are many potential sources of errors in surveys. The data collection method influences many sources of error and is often the primary focus for efforts aimed at reducing error.
For example, to reduce nonresponse error a researcher may devote a larger portion of her budget to incentives, but this budgetary decision will have implications for sample size which affects other sources of error.
When applying the TSE framework to survey design decisions, it is important to make every tradeoff explicitly and with as much information as possible. This will allow you to assess and account for the level of error associated with each design decision. The goal for most researchers will be to minimize error (maximize quality) within the constraints of a particular budget.
To determine the approach that will minimize TSE, the researcher must assess the likely level of error for each possible alternative procedure in the flow of survey design.
How to use the TSE framework
The success of applying the TSE framework depends on having good information about the costs and errors associated with each step and decision of the survey process.
This information may be theoretical, from the survey methodology literature, or it could be empirical from prior survey data collection efforts. There’s potential for error to be introduced at almost every stage.
The key is to make use of all information available when making survey design decisions, so you can make the best decision about which errors to target first.
Sources of error in survey projects
Let’s review some of the sources of error from the TSE in a little more detail.
Errors of non-observation
- Coverage error
Coverage error is similar to sampling error in that it results from a mismatch between the sample and the population being measured. However unlike sampling error, coverage error can be avoided. It results from a sampling method that somehow leaves out part of the target population, for example by using a recruitment method that is inaccessible to them (such as using TV ads to reach people who only have a radio).
- Sampling error
As we discussed, sampling error is the margin of error that inevitably creeps in when you use a sample to represent a larger population. Total accuracy in surveying is impossible – because we live in an imperfect world, no sample can be 100% representative.
Non-response bias happens when there’s a balanced sample, but the people who actually answer your survey – inevitably a smaller group – ends up being disproportionate. The greater your response rate, the less of a problem this is likely to be.
Errors of observation
Instrument error refers to problems with the survey questionnaire itself. Maybe it’s too long, too complex, the wording is confusing or introduces bias, or there’s an issue with the survey logic which means the experience is poor.
This is error that stems from the respondents themselves. They may give incorrect or inconsistent information or misrepresent themselves. An example of respondent error is straight-lining, where the respondent gives the same answer or checks the same box on every question. If you offer survey rewards, you may be more at risk of bad-faith respondents and survey fraud if you don’t have security steps in place.
This one mostly applies if you are using face-to-face or telephone surveys. It might happen if the person recording responses ticks the wrong box by mistake or mixes up their paperwork. Fortunately, with online surveys becoming the standard, this is less of a worry for researchers than it once was.
Errors of processing
- Coding, editing and adjustment errors
These are errors that appear after data has been collected and while it’s being processed. It might relate to a problem in interpreting patterns in data, using statistical tests or when doing data cleaning.
Common errors to avoid when writing surveys
As you can see from the examples so far, some sources of error are more amenable to researcher control than others – and some are totally beyond our reach. With that in mind, let’s focus on an aspect of survey research that’s completely within the researcher’s scope of influence – the survey questionnaire.
Here are 9 survey mistakes to avoid in order to get better data and make better decisions.
1. Loaded questions and leading words
The language you use should be as neutral as possible – even a small degree of bias could change how people answer.
- Look out for words that feel emotionally weighty, either in the positive (‘welcomed’, ‘celebrated’, ‘familiar’) or negative (‘prohibit’, ‘compel’, ‘demand’)
- Avoid phrases that sound persuasive or leading (“wouldn’t you like to…?”)
- Stay consistent when using synonymous words. Our data shows that even substituting “could”, “might” and “should” can have a meaningful impact on your final results.
2. Misplaced questions
Question order matters. How you structure your question flow could unintentionally introduce bias. To limit this, there are a few things you can try.
- Randomize question order so that each respondent gets a different experience. If your survey is more complex with interdependent questions, consider randomizing blocks of questions.
- Avoid “ringer” or “throw away” questions. These are questions used to capture participant interest and they don’t influence the final data. Use them with caution, as they can introduce unnecessary noise into your data.
3. Mutually non-exclusive response categories
Q. Are you: a) 15-25 years old, b) 25-50 years old, c) 50-55 years old?
If you’re 25 or 50, you will struggle to answer a question set up like this. Use mutually exclusive multiple-choice questions as a general rule.
4. Non-exhaustive listings
It’s frustrating and confusing for respondents if they’re provided with a list of responses that don’t include the one that’s correct for them.
- Check that at least 90% of likely responses are covered by pre-testing your survey questionnaire.
- Add a final “other” option with a free text box for the respondent to tell you their answer if it’s not covered in your multiple-choice list.
5. Nonspecific questions
Fuzzy question wording leads to fuzzy answers and unhelpful data.
- Be specific about what you want to know.
- Use the wording of your question to set the parameters of the answer, e.g. “thinking about the last 6 months, how often do you visit your nearest corner store?”
- Collect feedback on the clarity of your questions when you pretest your survey
6. Confusing or unfamiliar words
Plain language for a low reading age should be the default for your questionnaire.
- Avoid specific jargon or confusing acronyms
- Make sure your audience understands your language level, terminology, and intent
- Keep it simple (9-11th grade reading level)
7. Forcing respondents to answer
If you’re observing good survey ethics, your respondents should never feel forced to answer or like they have to participate when they are uncomfortable.
- Make sensitive questions skippable if you can
- Include a don’t know or prefer not to say option
- Word sensitive questions carefully, and check how the wording is perceived when you pretest your survey
8. Unbalanced listings
When using rating scale and slider questions, make sure the scale isn’t loaded in one direction or another.
- Make sure your scale includes obvious and equally weighted positive, negative and neutral options
- Clearly identify the end-points of your scales
9. Double-barreled questions
Survey questionnaires should be short and sweet, but avoid squeezing multiple questions into one survey item.
“How happy are you with the speed and quality of your internet connection?”
“How happy are you with your internet connection quality?”
“How happy are you with your internet speed?”
How to minimize survey errors
Pretesting your survey is a best practice we always recommend since it can make or break the quality of your final results. It’s very hard to go back and re-run a survey because it had errors. But it’s easy to just run a few simple checks before you launch to ensure you’ve avoided major survey pitfalls.
Guerilla test first
Once you have written your survey, test it on a small group of people inside your organization. Have 5 people – ideally they should match your sample as closely as possible – take the survey and provide you with feedback. The main things you are testing for are:
- Were any questions unclear?
- Were any questions leading?
- Did any of the questions take too long to answer?
- Did the survey present well on smaller screens like a phone or tablet?
- Were there any page errors?
- Was the logic working correctly?
- Did the survey take as long to answer as was promised?
- Were all the options in the questions applicable to the question?
Send test responses
You can test your survey’s functionality by running dummy test responses through it. Use this functionality to ensure that your logic and embedded code is working correctly.
Another option for testing your survey is to run a soft launch, aka a pilot study. This means you only send your survey to 10% of your sample list and then pause data collection so that you can go through your data to ensure everything is running smoothly.
Tools to help
AI and machine learning tools such as ExpertReview can lend a helping hand by reviewing your survey and automatically alerting you to any errors with the logic and question flow, potential compliance issues, and even potential wording bias in your survey questions. Its methodology is built on reviews of thousands of live surveys, so you’re getting the benefit of an exhaustive QA process without the exhausting work on your part.