What is MaxDiff Analysis?
MaxDiff analysis is a market research technique for measuring the preference and importance that customers place on a list of items. It can play a critical role in understanding the trade-offs that people would make and ultimately provides a rank-ordering of the list. It can be used on lists of features/functionality of a product, messaging, claims, attributes, characteristics and much more. It is sometimes referred to as Best-Worst Scaling or maximum difference scaling, and was developed by J.J. Louviere.
How is it conducted?
MaxDiff analysis is conducted by showing participants subsets of items from a list and having the respondent identify the best and worst or most and least preferred options from said list. The reason we take this approach is that it can be challenging for a respondent to rank order 7 or more items in a survey experience. So what MaxDiff leverages is our ability to identify the polars (best and worst) from a list, and simplifies the task in a more digestible number of items at a time.
A respondent would typically see around 5 to 15 questions where 3 to 5 items are shown, and they are asked to indicate the best and worst from the list. This yields very accurate data, since it is a more comprehensible task for survey takers than presenting the entire list.
The steps for running a MaxDiff analysis are:
- Determine the attributes to be tested in the MaxDiff analysis.
- Generate the experimental design.
- Program the survey that hosts the MaxDiff tasks.
- Collect responses.
- Analyze the MaxDiff results.
- Report the findings.
Each of these builds upon the previous action in working toward the end goal of understanding the preferences of the customer base.
Qualtrics has developed a MaxDiff XM Solution that enables researchers to quickly and simply run respondents through trade-off exercises as part of a larger research objective.
What business objectives does MaxDiff analysis provide answers for?
There are key business objectives that MaxDiff can deliver on. This includes:
- How do customers prioritize product features or functionality?
- What are customers focused on when making their purchase decision?
- How do various messaging and product claims resonate with a target audience?
- When asked to choose from a finite list, how does the market perceive and value different products or services?
- How do different brands compare to each other, and how do customers rank-order them?
- What trade-offs will customers make when faced with different combinations of features?
As you can see, MaxDiff analysis can provide data for essential and dynamic business questions. There are also many non-product related inquiries that MaxDiff can answer.
MaxDiff can be a very effective research method for many use cases because of its flexibility and easy output. It should be the instrument of choice whenever researchers need insights into the rank-ordering of a list.
With MaxDiff analysis, we are looking for a list of attributes that act as offerings in and of themselves, rather than as attributes to be bundled together. Thus the items should be mutually exclusive and stand alone. The items can be features/functionality of a product, messaging or claims about a product, benefits offered to users or employees, and many other use cases.
For a MaxDiff analysis, you generally want to list 8 to 25 items. The more items you include, the more questions you will need to ask, so try to keep respondent fatigue in mind when you design your research.
Example: A list of MaxDiff attributes for a cupcake store may include the following flavors / menu items:
- Chocolate with chocolate frosting
- Chocolate with vanilla frosting
- Carrot cake with cream cheese frosting
- Red velvet
- Strawberry shortcake
- Chocolate mint fudge
- Peanut butter fudge
- Salted caramel
- Cookies and creme
- Toffee crunch
- Key lime
- German chocolate fudge
- Coffee toffee
Experimental Design and MaxDiff Analysis
MaxDiff’s experimental design determines which items will be shown in the questions presented to respondents. The design ensures proper representation for the ultimate outcome of securing accurate and reliable results. If designed correctly, the survey will collect the data necessary to rank-order the items.
The general rule is that we want every respondent to see each item three times. The by-product of this rule is that more MaxDiff items will result in more questions. In addition to this rule, there are also other conditions that Qualtrics respects as part of generating the experimental design including: Randomization, Item Balance, Paired Balance, and Item Networking.
- Randomization: The question and position within a question that a MaxDiff item appears in is randomly assigned.
- Item Balance: The number of times each item is shown within a respondent’s question set (also known as the “version”) are balanced and are shown relatively the same number of times. The number of times each item is shown is also balanced across all versions.
- Paired Balance: The number of times each item is shown with all of the other items is relatively balanced across all respondents.
- Item Networking: This is also known as connectivity. This rule ensures that if the items were divided into two equal groups, that there would never be an item within one group that is never shown with any of the items in the other group.
Survey & Sample Size
MaxDiff analysis is powered by survey responses. When a MaxDiff study is conducted, it is usually the focus of the survey, but it does not have to be the entirety of it. Regardless, it is critical that the MaxDiff exercise within the survey is concise and well structured.
MaxDiff surveys commonly includes screener questions to ensure the right type of respondents complete the survey, introduction and educational resources, and demographic questions. There are no hard rules on how many other questions can be added to a MaxDiff study or where in the survey flow the MaxDiff should fall. It should be noted that any question asked of the respondents outside of the MaxDiff exercise takes up time and focus that could otherwise be given to the MaxDiff exercise.
Survey length should be considered as the study is being designed and built out. When a respondent is fatigued by the survey, they are less likely to give thoughtful responses, thus lowering your data’s quality. Surveys that take more than 10-15 minutes are more susceptible to fatigue and data quality issues.
The data fielded from a MaxDiff study is only relative and accurate if the respondent fully understands the premise of the study. Many studies are testing concepts that are well-known and relatable to the general public. However, if that is not the case, time should be devoted in advance of the MaxDiff exercise to properly educate the respondent through descriptions and/or videos. The more clear and imaginable a product is to the survey-taker, the more true the resulting utility scores will be.
In addition to the text and descriptions being simple and straightforward, the layout of each question should also lend to understanding and clarity. This allows the respondent to make comparisons and answer definitively.
Critical to the success and accuracy of the MaxDiff results is the number of responses to collect as well as the relevance of the subject matter to the individuals taking the survey. A general rule of thumb is to collect a minimum total sample size of 300. With that in mind, it is also important to factor in segments of interest into the number of collected responses. We recommend that each segment has a n > 150.
It is important that the individuals taking the MaxDiff exercise are reflective of those who would ultimately be the buyer or the target market. Frequently, researchers will add demographic questions at the beginning of the survey to ensure irrelevant populations are screened out (e.g., those outside the targeted age range or region where the product will be available). Alternatively, companies will often have lists of current or prospective customers that they can deploy the survey to.
Modeling of MaxDiff Analysis
When analyzing MaxDiff response, respondent selections are translated into preferences. The outcome of the analysis will be a rank-order list of the preferences for the different items tested.
At the core of the analysis is the statistical modeling that estimates the utility that respondents assign to each item. MaxDiff analysis gets an intimidating reputation as “complex” because of its statistical modeling, but this is also what has made MaxDiff a world-class research technique. There are several statistical approaches used for calculating these utility preferences, including regression and multinomial logistical regression modeling, which are typically conducted on the aggregate level.
Regardless of the manner in which the survey selections are modeled, the output are utility coefficients that represent the value or preference that the respondent base has for the distinct MaxDiff item. For designs and analysis methods that allow for individual-level calculations of utility scores, we can derive preference models for every single respondent. This can be advantageous for a number of reasons, including segmentation of various data cuts, latent class analysis, and reach simulations. The primary approach taken to yield individual-based utility models is hierarchical Bayes estimation. This is a technique that uses Bayesian methods to probabilistically derive the relative value of each variable being tested.
Hierarchical Bayes Estimation
Hierarchical Bayes (HB) estimation is an iterative process. It encompasses a lower level model that estimates the individual’s relative utilities for the tested attributes, as well as an upper-level model that predicts the population’s preference. These two work together until the analysis convergences on the coefficients that represent the value of each attribute for each individual.
In a sense, HB estimation allows for borrowing information from other responses to gain even better and more stable individual-level results. It is very robust and allows us to get great insight into customer preferences, even while presenting less tasks to the respondent.
The technique is deemed “hierarchical” because of the higher and lower-level models. This approach estimates the average preferences (higher level) and then gauges how different each respondent is from that distribution to derive their specific utilities (lower level). The process repeats over a number of iterations to ultimately help us hone in on the probability of a specific concept being selected based on its utility (thus a Multinomial Logistical Regression model).
The Qualtrics MaxDiff Analysis project uses Hierarchical Bayes estimation written in STAN to calculate individual preference utilities.
Individual Level Utility Coefficients
The outcome of the Bayesian model are preference scores that represent the utility that the individuals perceive with each variable. These scores are frequently called partworth utilities, and are the basis of all summary metrics produced from the MaxDiff study.
The utility file would have a row for each respondent included in the MaxDiff analysis, and a column for each unique level tested within the study. In modeling the preferences of each respondent, the utilities help us predict what selections respondents would make when faced with different lineups.
The utilities are ordinal in nature and tell us the rank order of list of variables.
MaxDiff Summary Metrics
MaxDiff Summary Metrics
After the analysis determines the utility coefficients, outputs and deliverables can be prepared to showcase the findings of the study. The utilities are the building blocks of all of the summary metrics.
The core summary metrics that typically accompany MaxDiff analysis are detailed below:
- Preference Share: The preference share is the measurement of the probability that an item would be chosen over another if a respondent was asked to select the best from all options. It is a product of the utilities calculated using a Multinomial Logistical Regression model, and is derived by exponentiating the item utility and dividing that by the sum of all of the exponentiated items’ utilities.
- Average Utility: The average utility score of each item across all respondents. These are ordinal in nature and will show the relative preference between items. The average utilities can give some directional understanding but should not be a standalone metric to summarize the MaxDiff analysis.
- Counts Analysis: The count analysis is a metric that simply tells us the percentage of time each item was selected most/least when it was shown.
What is Anchored MaxDiff?
Anchored MaxDiff is a supplemental methodology where a follow-up question is asked after each MaxDiff task. It has some similarities to dual-choice conjoint analysis both in how the question is asked as well as how it is modeled.
The approach includes asking a question immediately after each MaxDiff task. After being presented with the list of items, the respondent is asked whether:
- All of the items they see above are important/preferred.
- Some of the items they see above are important/preferred and some unimportant/not preferred.
- All of the items they see above are unimportant/not preferred.
This data is factored into the statistical model. It yields an understanding of the anchor point for where the utility output is above and below a line where items are actually deemed important or preferred.