The old adage “garbage in, garbage out” is as true in survey sampling as it is in any other context. To perform any sound statistical analysis, you first need to know who you actually sampled and who you wanted to sample in the first place. If those two groups are disjoint at all, you might as well give up now and start over, as the bulk of your analysis will probably be misleading. Here are some steps to ensure you don’t end up past the point of no return.

Define characteristics that categorize the group of people about which you’d like to know something. This can be very general (general consumer) to very specific (HR managers from Pittsburgh working in healthcare).

Step 2: Define subgroups
This step is optional. Are there any factors that could affect in any way what it is that you’re trying to measure?

If not, skip this step. If so, write them down.

In survey sampling, you want to ensure that your sample is as representative as it can possibly be to the population you defined in Step 1. This will help in the analysis, and the following example will show you how.

Example: Let’s say your last survey included a question asking whether the respondent followed the ABC TV show “Dancing with the Stars.” The raw results come in saying that 61.5% of your 800 respondents follow the show. Breaking that number down by gender we find that 39% of males followed the show and 73% of females watched the show.

Now, let’s put gender into the equation. Let’s say that, out of all your respondents, 65% were female and 35% were male. Assuming that gender amongst TV watchers is an even 50/50 split, females are overrepresented in your sample and we would like to down-weight them. We then find that the weighted proportion of respondents following the show is 56%.

We see how defining additional subgroups in our sample helped us get a more accurate estimate. The more granular you get in your design, the more accurate your estimates will be.

Have subgroups within your subgroups? Repeat Step 2 as many times as needed.

Step 3: Determine sampling design
The sampling design for your study depends on how you defined the groups in Steps 1 and 2.

Simple Random Sample: You skipped Step 2 altogether, and everybody in the group has the same chance of being selected. You’re performing a simple random sample of the population you defined.

Probability Proportional to Size (PPS) Sample: You skipped Step 2, but some individuals had a higher probability of selection based on some criterion.

Stratified Sample: You defined subgroups in Step 2, and possibly even subgroups within the subgroups. Each subgroup is considered a stratum, and the sample weights for each individual are the same within each stratum.

Cluster Sample: You went crazy.

Some people call this multi-stage sampling, essentially where you perform a sample from a large group, take a sample from the sample you already selected, and so forth as much as is necessary. This is the most complex sampling method both in design and analysis.

Whatever your design is, you can incorporate it into your analysis with any basic statistical software (SPSS, SAS, STATA, R). There’s more on sampling design and analysis than we can possibly blog about, but we’ve found these to be the most common.