Synthetic Panels in Market Research: What You Need to Know

Synthetic responses offer exciting new possibilities for gathering consumer insights, but as with any innovation, it's crucial to discern the noise from the value. Not all synthetic research solutions are created equal, and understanding the nuances can be pivotal for your research strategy.

Understanding synthetic responses

Before diving into the intricacies, let's define what synthetic panel responses are. At their core, these are AI-generated responses that mimic real human feedback. While they hold immense potential to bolster your insights, the approach and methodology behind their generation varies significantly. Here’s a closer look at the three primary types of synthetic research offerings you'll encounter as you navigate this crowded space:

1. Large Language Model (LLM) Wrapper

The most common form of synthetic response generation is through a "wrapper" model, which often employs advanced interfaces surrounding more established models like ChatGPT. Here, the responses are crafted using sophisticated prompt engineering (that’s the real IP) that draws from publicly available information. While this can yield relevant data in aggregate, the micro-level data tends to be more uniform and lacking in granularity. A major caution here is the lack of ability to “slice and dice” by segment or demographic.This approach asks you to specify the audience you’re interested in surveying, upload your survey, or add your questions and sit back as the interface gathers the responses from over 7-10 billion parameters being considered from publicly available data (same sources as Claude, OpenAi, Llama, Gemini, Mistral, etc).

LLM wrapper at-a-glance:

How it works:

Utilizes an LLM trained on generic publicly available data
Intellectual property arises from effective prompt engineering
Beyond generating data, the model can perform additional tasks such as reasoning
Enhances utility in various applications

Good for:

Quickly providing insights on generic topics
Answering individual questions swiftly
Performing basic reasoning and judgment
Addressing a wide range of subjects (though responses may not always be highly detailed or accurate)

Not as good for:

Answering questions on niche topics not widely discussed in public
Addressing current events or recent developments
Generating quantitative data or survey responses
Delivering accurate Net Promoter Scores, Customer Satisfaction scores, or responses on a Likert scale
Primarily designed for conversational engagement rather than structured metrics

2. Machine Learning (ML)-Powered Model

Another prevalent method for synthetic market research involves ML models that leverage a base of human-collected responses to extrapolate findings. For instance, if you conduct a concept test study with 300 UK consumers, the model can learn from those 300 responses and generate 300 more unique synthetic responses. This application is powerful for enhancing the breadth of your data, but it depends heavily on the quality and representativeness of the original responses used to inform the model. ML-powered boosting of human-collected data can be enhanced by proprietary foundational models, which are described next.

ML model at-a-glance:

How it works:

Trained through supervised learning using tabular data
Employs techniques such as Decision Trees, XGBoost, Random Forest, or Neural Networks
Intellectual property derives from the training data and chosen modeling technique

Good for:

Data imputation
Filling gaps based on historically similar examples
Replicating data patterns
Generating quantitative data for specific use cases it has been trained on
Making predictions

Not as good for:

Single-purpose tasks (lacks flexibility and extensibility)
Operating outside the data and situations it was trained on
Performing reasoning
Generating qualitative feedback

3. Foundational Large Language Model (LLM)

The most sophisticated and complex type of synthetic panel generation involves foundational models that utilize vast pools of proprietary human response data alongside extensive open data sources. These models are capable of delivering granular, targeted insights that resonate with specific cohorts or market segments. This approach hydrates enormous base models with highly specific data to produce tailored responses that reflect the nuances of diverse populations more accurately than their less complex counterparts. Early innovation shows these LLMs deliver smarter and more accurate insights as a result of the variety of parameters (breadth and depth) being incorporated.

Proprietary data and licensed data sources—in addition to publicly available data—are consulted to produce the requisite volume of responses needed.

Foundational LLM model at-a-glance:

How it works:

Operates similarly to wrapper models but is trained on task-specific data
Uses proprietary or non-public data sources
Designed for specific tasks
Intellectual property derives from the training data used

Good for:

Providing insights on topics and questions it has been trained or fine-tuned to address
Generating qualitative feedback on those topics
Performing reasoning and judgment within its domain area
Responding to a wide range of subjects with better performance than generic LLMs
Answering survey questions (specifically trained for such tasks)

Not as good for:

Imputation and generating quantitative data or predictions (compared to machine learning models, though better than out-of-the-box LLMs)
Answering questions on out-of-sample topics or questions outside its training scope

Define your objectives and use cases

Which synthetic offering you ultimately decide to use should be predicated not only on the goals of your research project, but also on other factors like, what are the resources you have available and the volume of data you’re using. For example, foundational models may perform better for large datasets due to their robustness and ability to generalize across a variety of contexts, while smaller datasets may require simpler machine learning models or wrappers to avoid overfitting - when a model learns the training data too well, and performs poorly on unseen data.

From a resource perspective, wrapper models have the least technical expertise for implementation, but the trade off is you’ll lose some of the flexibility that comes with the foundational and machine learning models. Additionally, LLM wrapper models and machine learning models typically have lower resource requirements compared to foundational LLMs, which require significant computational power for training and deployment.

What to look out for

It’s essential to approach synthetic panel providers with a critical eye. To help you navigate this growing landscape, here are some questions to consider in your evaluation of potential synthetic response tools:

What type of model do you leverage? Understanding the methodology behind the synthetic responses is key. Are you looking at a wrapper, a machine learning model, or a foundational model?
How is the model trained? The training process can significantly affect the outputs the model generates. Is it based on a wide array of reputable sources? What is the demographic makeup of the data used?
How often is the model updated? A model that learns and adapts over time can be far more valuable than one that relies on static data sets.
How has their approach been validated? What is the validation process and what type of standards of excellence do you hold your models to?
Can you generate completely new responses to any topic? The capacity for creativity in response generation can reveal the model's sophistication.
What populations can you model and which can't you? Knowing the limitations of the provider’s capabilities can help you avoid misalignment with your target audience.

Research teams that aren’t sure where to begin with synthetic should start small with pilot projects to assess its effectiveness and gather feedback before a full-scall rollout. By monitoring how well the synthetic data meets your defined objects, adjustments can be made and insights gleaned for future projects. As the organization becomes more comfortable with synthetic data, expand its use to additional use cases and projects.

Synthetic data for your organization

In a world where insights are critical for staying ahead, synthetic responses present both opportunities and challenges. By taking the time to understand the different types of synthetic models and asking the right questions of potential providers, marketers and market researchers can harness these tools more effectively. Don’t fall for the hype—ensure that the synthetic responses you integrate into your strategies are as robust, accurate, and valuable as they can be.

See how organizations like Booking.com have integrated synthetic responses into their research approaches

Learn More

Synthetic panels in market research: What you need to know

Understanding synthetic responses

1. Large Language Model (LLM) Wrapper

2. Machine Learning (ML)-Powered Model

3. Foundational Large Language Model (LLM)

Define your objectives and use cases

What to look out for

Synthetic data for your organization

Get the latest news on XM events, research, and product launches