
Brand Experience
Synthetic panels in market research: What you need to know
In the modern research landscape, the need for rapid insights – and more research in general – has driven marketers and researchers to consider new AI-generated methods, like synthetic panels, that can significantly enhance the breadth and speed of findings. Synthetic responses offer exciting new possibilities for gathering consumer insights, but as with any innovation, it's crucial to discern the noise from the value. Not all synthetic research solutions are created equal, and understanding the nuances can be pivotal for your research strategy.
Understanding synthetic responses
Before diving into the intricacies, let's define what synthetic panel responses are. At their core, these are AI-generated responses that mimic real human feedback. While they hold immense potential to bolster your insights, the approach and methodology behind their generation varies significantly. Here’s a closer look at the three primary types of synthetic research offerings you'll encounter as you navigate this crowded space:
1. Large Language Model (LLM) Wrapper
The most common form of synthetic response generation is through a "wrapper" model, which often employs advanced interfaces surrounding more established models like ChatGPT. Here, the responses are crafted using sophisticated prompt engineering (that’s the real IP) that draws from publicly available information. While this can yield relevant data in aggregate, the micro-level data tends to be more uniform and lacking in granularity. A major caution here is the lack of ability to “slice and dice” by segment or demographic.This approach asks you to specify the audience you’re interested in surveying, upload your survey, or add your questions and sit back as the interface gathers the responses from over 7-10 billion parameters being considered from publicly available data (same sources as Claude, OpenAi, Llama, Gemini, Mistral, etc).
LLM wrapper at-a-glance:
How it works: | Utilizes a LLMtrained on generic publicly available data, with intellectual property arising from effective prompt engineering. Beyond simply generating data, the model is capable of performing additional tasks such as reasoning, enhancing its utility in various applications. |
Good for: | Useful for quickly providing insights on generic topics, answering individual questions swiftly, and performing basic reasoning and judgment. While it can address a wide range of subjects, its responses may not always be highly detailed or accurate. |
Not as good for: | It is not effective for answering questions on niche topics that are not widely discussed in public, addressing current events, recent developments, or generating quantitative data or survey responses. It is primarily designed for conversational engagement rather than delivering accurate Net Promoter Scores, Customer Satisfaction scores, or responses on a Likert scale. |
2. Machine Learning (ML)-Powered Model
Another prevalent method for synthetic market research involves ML models that leverage a base of human-collected responses to extrapolate findings. For instance, if you conduct a concept test study with 300 UK consumers, the model can learn from those 300 responses and generate 300 more unique synthetic responses. This application is powerful for enhancing the breadth of your data, but it depends heavily on the quality and representativeness of the original responses used to inform the model. ML-powered boosting of human-collected data can be enhanced by proprietary foundational models, which are described next.
ML model at-a-glance:
How it works: | Trained through supervised learning using tabular data, employing techniques such as Decision Trees, XGBoost, Random Forest, or Neural Networks. The model's intellectual property derives from the training data and the chosen modeling technique. |
Good for: | Effective for data imputation, filling gaps based on historically similar examples, replicating data patterns, generating quantitative data for specific use cases it has been trained on, and making predictions. |
Not as good for: | Is not suited for single-purpose tasks, as it lacks flexibility and extensibility. It can only operate on the data and situations it was trained on, cannot perform reasoning, and is unable to generate qualitative feedback. |
3. Foundational Large Language Model (LLM)
The most sophisticated and complex type of synthetic panel generation involves foundational models that utilize vast pools of proprietary human response data alongside extensive open data sources. These models are capable of delivering granular, targeted insights that resonate with specific cohorts or market segments. This approach hydrates enormous base models with highly specific data to produce tailored responses that reflect the nuances of diverse populations more accurately than their less complex counterparts. Early innovation shows these LLMs deliver smarter and more accurate insights as a result of the variety of parameters (breadth and depth) being incorporated.
Proprietary data and licensed data sources—in addition to publicly available data—are consulted to produce the requisite volume of responses needed.
Foundational LLM model at-a-glance:
How it works: | Operates similarly to wrapper models,but is trained on task-specific data, which is often proprietary or non-public. It is designed for a specific task, with its intellectual property deriving from the training data used. |
Good for: | Excels at providing insights on topics and questions it has been trained or fine-tuned to address, generating qualitative feedback on those topics, and performing reasoning and judgment within its domain area. While it can respond to a wide range of subjects, its performance is often better than that of generic LLMs, particularly for answering survey questions, as it has been specifically trained for such tasks. |
Not as good for: | Is not as effective at imputation and generating quantitative data or predictions compared to machine learning models, although it performs better than out-of-the-box LLMs. Additionally, it struggles to answer questions on out-of-sample topics or questions outside its training scope. |
Define your objectives and use cases
Which synthetic offering you ultimately decide to use should be predicated not only on the goals of your research project, but also on other factors like, what are the resources you have available and the volume of data you’re using. For example, foundational models may perform better for large datasets due to their robustness and ability to generalize across a variety of contexts, while smaller datasets may require simpler machine learning models or wrappers to avoid overfitting - when a model learns the training data too well, and performs poorly on unseen data.
From a resource perspective, wrapper models have the least technical expertise for implementation, but the trade off is you’ll lose some of the flexibility that comes with the foundational and machine learning models. Additionally, LLM wrapper models and machine learning models typically have lower resource requirements compared to foundational LLMs, which require significant computational power for training and deployment.
What to look out for
It’s essential to approach synthetic panel providers with a critical eye. To help you navigate this growing landscape, here are some questions to consider in your evaluation of potential synthetic response tools:
- What type of model do you leverage? Understanding the methodology behind the synthetic responses is key. Are you looking at a wrapper, a machine learning model, or a foundational model?
- How is the model trained? The training process can significantly affect the outputs the model generates. Is it based on a wide array of reputable sources? What is the demographic makeup of the data used?
- How often is the model updated? A model that learns and adapts over time can be far more valuable than one that relies on static data sets.
- How has their approach been validated? What is the validation process and what type of standards of excellence do you hold your models to?
- Can you generate completely new responses to any topic? The capacity for creativity in response generation can reveal the model's sophistication.
- What populations can you model and which can't you? Knowing the limitations of the provider’s capabilities can help you avoid misalignment with your target audience.
Research teams that aren’t sure where to begin with synthetic should start small with pilot projects to assess its effectiveness and gather feedback before a full-scall rollout. By monitoring how well the synthetic data meets your defined objects, adjustments can be made and insights gleaned for future projects. As the organization becomes more comfortable with synthetic data, expand its use to additional use cases and projects.
Synthetic data for your organization
In a world where insights are critical for staying ahead, synthetic responses present both opportunities and challenges. By taking the time to understand the different types of synthetic models and asking the right questions of potential providers, marketers and market researchers can harness these tools more effectively. Don’t fall for the hype—ensure that the synthetic responses you integrate into your strategies are as robust, accurate, and valuable as they can be.
See how organizations like Booking.com have integrated synthetic responses into their research approaches