First things first – What is data cleaning?Cleaning data means getting rid of any anomalous, incorrectly filled or otherwise “odd” results that could skew your analysis. Some examples include:
Straight-lining, where the respondent has selected the first response to every question, regardless of the question.
Christmas-trees, where answers have been selected to create a visual pattern or picture – resembling a Christmas tree or some other deliberate design – rather than in response to the survey questions.
How to find the ‘dirt’ when data cleaningThere are a few methods experienced survey designers use spot the results that should be weeded out. These can involve looking at the metadata of the survey or visualising data to uncover patterns.
Find the fastest respondents
Time data can show where respondents have whizzed through a survey selecting answers without properly reading and considering the questions. Setting a ‘speed limit’ for your responses can help eliminate thoughtless or random answers.
Turn numeric data into graphics
For issues like Christmas tree or straight-lining respondents, it can be easier to spot problems if your data appears as a chart or graph rather than a table of numbers.
Review open-ended questions
Where your survey design requires participants to answer in their own words, you can spot problem data by noting where the open fields have been filled in with nonsense text. This could indicate that they survey has been completed by a bot rather than a human or where the survey respondent was not engaged with questions.