Using Limits to Scale Efficiently
Using limits to scale efficiently
Here at Qualtrics, we are making a platform for innovation in the experience management space. The Qualtrics Developer Platform is only the latest evidence of this: from the start, Qualtrics has been about empowering business customers to create their own solutions. Building a platform enables us to foster a diverse ecosystem of use cases, but it also comes with challenges: we can't expect that all customers' needs will be met in a single path. We have to provide flexibility, but we also set reasonable limits.
For example, when survey creators make a Text Entry question, we limit responses to 20,000 characters. This prevents respondents from pasting in, say, the entirety of Dostoyevsky's The Brothers Karamazov. Note that such input isn't technically invalid – fine Russian literature is a great example of text input – but we curtail it anyway.
Taking it to the limit
We're not unique in facing this problem: power users always push the limits of their chosen platforms, from video game speedrunners to Excel mavens. On our platform, we've seen that, absent limits, our most prolific customers will create data with sizes well in excess of 100× greater than the 99th percentile.
This raises the question: What's motivates these customers? Are they bad actors trying to break our product? (Sometimes, but not often.) Enthusiasts who enjoy turning it up to 11? (Maybe our QE team.)
Our experience is that most of our limit-pushers are just customers who have a tough problem to solve, getting their job done in a way that doesn't line up well with the assumptions of the product.
Why can't we just scale?
Scale is a fantastic goal, and we do indeed build systems that scale quite well. However, allowing large amounts of data in progressively more dimensions quickly takes us into computing's own variant of the "curse of dimensionality": While scaling by 100× is doable, scaling by 100× in each of five different dimensions means scaling by 1010, or ten trillion times. This is more difficult.
It's important to note that such scaling isn't impossible, but it involves significant engineering effort. Given that our engineering budget is limited, limits let us be intentional in choosing to scale in dimensions that correlate with business impact: We can avoid fretting about Russian novels in our responses and instead focus on easily handling millions of (more plausibly-sized) survey responses each day.
Besides focusing our engineering efforts, limits give customers fast feedback when their solution doesn't fit a given part of the platform. For example, a web survey is optimized for capturing experiences in the moment, and isn't the right mechanism for submitting NaNoWriMo drafts.
Like a lane marker on a road, a limit provides a clear boundary that helps customers "stay on the path": By being exposed to the edges, customers are nudged closer to best practices. Compared to up-front training or out-of-band documentation, limits are contextual and organic: They surface their information to customers just as it becomes relevant.
If customers are consistently hitting certain limits, it may signal something different: that we need to provide new functionality in our platform. Sometimes there is already a better tool available, but the customer hasn't found it. For example, tracking transactions directly as fields in individual XM Directory contacts can be a headache (since each contact can have a lot of transactions), which is why we support transactions as first-class items.
Limits can teach customers about your product and expose potential feature gaps (or discoverability gaps) to engineers.
Knowing your limits
How do you find good places to set limits? Good ideas come from both business and technical requirements.
Business requirements often push for more capability: higher limits, or even no limits. But there are good business reasons to set limits: price tiering can help provide appropriate levels of service to a diverse customer base, and limits can also help to signal membership in a particular category. Limits can even simplify your product UX: A widget that allows choosing between 1,000 options is much more difficult to navigate (and to design well) than one that only allows six.
On the technical side, backend systems bring along their own limits. We use scale testing and careful monitoring of metrics to find out where these limits currently are. Additionally, we often over-constrain our service limits, so even if a service can handle payloads of up to 100GB, we may restrict inputs to a much smaller size (say, 10MB) if that meets the needs of customers. This lets us focus effort on the dimensions that will provide the most value, like request volume.
Once you've defined a good set of limits, how do you implement them? Here are some strategies we've used in our microservice architecture:
First, the back end: Individual services should set their own authoritative limits, ensuring that inputs don't provoke service outages. It's important to let service consumers know the reason their requests are being rejected, too. As an example, a RESTful web service can signal a limit with HTTP status codes like 413 (request too large) or 429 (too many requests).
But don't stop there: Making limits visible in the front end will provide a smoother experience for customers, advising them of limits right as they run into them, or even earlier. For example, instead of returning an error when a customer saves a description over 500 characters, you might show a warning when they type the 400th character, then stop accepting text entirely at 500.
If you have an existing system that doesn't have limits, it's not too late to set some! First, survey the territory: Do some analytics and find out what the distribution of data is. What are your pathological cases, and how big are they?
Our own analysis of survey definitions produced some customer creations that were 100× larger than we thought they needed to be, which was surprising and a call to action. Sharing this data with decision makers drove support for the idea of limits, and the analysis provided input for defining reasonable, informed limits.
You'll always need to make room for the exceptional: You probably have some good customers that have created data that falls afoul of the limits you'd like to impose, and they may merit a grandfather clause. Even after broadly imposing reasonable limits, you may choose to exempt some strategic customers from them, enabling larger, more complex use cases.
The important thing with any exception is to make it exceptional: By default, the limit should be imposed on new customers and data, with an override available on a case-by-case basis.
Closing the loop
Once you've got general limits in place, take time to talk with the customers you've granted exceptions to. See if you can help them reach their objectives without falling afoul of the new limits. You may find a better way to guide other customers away from similar pitfalls, or even a product gap you can remedy. Remember, good limits guide engineering teams as well as customers.
Be happier with limits
At Qualtrics, we've found success in imposing reasonable limits throughout our platform. From the back end to the front end, we've worked hard to define bright-line boundaries between productive and pathological usage. Sometimes we've defined limits at the outset of a new service; other times, we have had to retrofit existing systems with limits in response to new concerns. We haven't regretted it.
From text response length to file upload size, limits have helped us focus on providing reliability at scale in the dimensions that matter to our business. We've also directed customers toward better utilization of our product and, gently steered them away from usage patterns that would be detrimental to our system and to their own productivity.
As you define and grow your own products, take time to define and apply reasonable limits. You'll find there's no limit to what you can accomplish!