What is sampling error and why does it matter?
To understand what sampling error is, you first need to know a little bit about sampling and what it means in survey research. (If you’re all clued up on sampling already, feel free to skip ahead to the next section.)
When you’re running a survey, you’re usually interested in a much bigger group of people than you can reach. The practical solution is to take a representative sample – a group that stands in for the whole of your research population.
To make sure that your sample is a fair representation, you need to follow some survey sampling best practices. Perhaps the most well-known of these is getting your sample size right. (Too big and you’re putting in lots of work for no meaningful gain; too small and you can’t be sure your sample is representative.)
But there’s more to doing sampling well than just getting the right sample size. For this reason, it is important to understand both sampling error and non-sampling errors so you can prevent them from causing problems in your research.
Non-sampling errors vs. sampling error: definitions
Somewhat confusingly, the term ‘sampling error’ doesn’t mean mistakes researchers have made when selecting or working with a sample. Problems like choosing the wrong people, letting bias enter the picture, or failing to anticipate that participants will self-select or fail to respond: those are non-sampling errors, and we’ll cover several of the worst offenders later in the article.
Non-sampling errors can happen whether you’re working with a representative sample (such as with a national survey) or doing total enumeration (such as when you’re carrying out employee experience surveys with your workforce.)
Meanwhile, sampling error means the difference between the mean values of the sample and the population, so it only happens when you’re working with representative samples.
Interestingly, it’s not usually possible to quantify the degree of sampling error in a study since – by definition – the relevant data for the entire population is not measured.
As OECD explains, a population will never be perfectly represented by a sample because the population is larger and more complete. In this sense, sampling error is a feature of sampling rather than a human error, and it can’t be completely avoided.
However, sampling error can absolutely be reduced by following good practices – more on that below.
Sampling and non-sampling errors: 5 examples
1. Population specification error (non-sampling error)
This error occurs when the researcher does not understand who they should survey. For example, imagine a survey about breakfast cereal consumption in families. Who to survey? It might be the entire family, the person who most often does the grocery shopping, or the children. The shopper might make the purchase decision, but the children influence the cereal choice.
2. Sample frame error (non-sampling error)
A frame error occurs when the wrong sub-population is used to select a sample. A classic frame error occurred in the 1936 presidential election between Roosevelt and Landon. The sample frame was from car registrations and telephone directories. In 1936, many Americans did not own cars or telephones, and those who did were largely Republicans. The results wrongly predicted a Republican victory.
The error here lies in the way a sample has been selected. Bias has been unconsciously introduced because the researchers didn’t anticipate that only certain kinds of people would show up in their list of respondents, and parts of the population of interest have been excluded. A modern equivalent might be using cell phone numbers, and therefore inadvertently missing out on adults who don’t own a cell phone, such as older people or those with severe learning disabilities.
Frame errors can also happen when respondents from outside the population of interest are incorrectly included. For example, say a researcher is doing a national study. Their list might be drawn from a geographical map area that accidentally includes a small corner of a foreign territory – and therefore include respondents who are not relevant to the scope of the study.
3. Selection error (non-sampling error)
This occurs when respondents self-select their participation in the study – only those that are interested respond. It can also be introduced from the researcher’s side as a non-random sampling error. For example, if a researcher puts out a call for responses on social media, they’re going to get responses from people they know, and of those people, only the more helpful or affable individuals will reply.
Selection error can be controlled by going extra lengths to get participation. A typical survey process includes initiating pre-survey contact requesting cooperation, actual surveying, and post-survey follow-up. If a response is not received, a second survey request follows, and perhaps interviews using alternate modes such as telephone or person-to-person.
4. Non-response (non-sampling error)
Non-response errors occur when respondents are different than those who do not respond. For example, say you’re a company doing market research in advance of launching a new product. You might get a disproportionate level of participation from your existing customers, since they know who you are, and miss out on hearing from a broader pool of people who don’t yet buy from you.
This may occur because either the potential respondent was not contacted or they refused to respond. The extent of this non-response error can be checked through follow-up surveys using alternate modes.
5. Sampling errors
As described previously, sampling errors occur because of variation in the number or representativeness of the sample that responds. Sampling errors can be controlled and reduced by (1) careful sample designs, (2) large enough samples (check out our online sample size calculator), and (3) multiple contacts to assure a representative response.
Be sure to keep an eye out for these sampling and non-sampling errors so you can avoid them in your research.