Don’t Scrap it: Building a QE Ecosystem the Scrappy Way
I joined Qualtrics about two years ago. At the time our Quality Engineering team had 6 people. It has since expanded to about 30. Over the same time period the total number of employees at Qualtrics has increased from 400 to 1400. The inevitable growing pains are just part of the game with a tech company like Qualtrics, which is why it is important for us to be scrappy. At Qualtrics we define scrappy as follows:
We're smart, resourceful and find a way. We write our own story instead of following others.
- We're street fighters with street smarts
- We innovate to do more with less
- We think differently and don't rely on conventional wisdom
- We do hard things
- We nail it then scale it
Scrappiness is one of the principles that Qualtrics was built on. It’s easy to see how important being resourceful and inventive is during the beginning stages of a startup. However, as we grow and experience continued success, it could be easy to lose focus on being scrappy. With a valuation of $2.5 billion, $180 million in new funding, and 1400 employees, it would seem natural for us to think, “Qualtrics has plenty of money, why do I need to look for ways to be scrappy?” or “One employee being scrappy won’t make any difference”. When you look at just one employee it can be hard to see the benefits of being scrappy. However, when scrappiness becomes a part of the culture for all employees, we see the cumulative benefits. This is what we strive for at Qualtrics.
One of the challenges of being scrappy is making sure that you really think through the problem at hand and the potential fix. If you are trying too hard to be scrappy without considering the risks of your approach, you may end up with a cheap, unreliable solution.
Scrappiness in Action
A great example of scrappiness at Qualtrics is the way the Quality Engineering team approached the task of building automated testing infrastructure. The associated challenges included finding machines to run our ever-increasing number of automated tests, monitoring and maintaining these machines, and running tests on multiple browsers. We were able to find a scrappy, smart solution to these problems.
Finding Testing Machines
About 3 years ago, as our automated test framework began to mature, we needed machines to host the test runs. As we scoped out options for running our automation we came across IT’s stash of outdated and broken iMacs. These desktop computers had previously been used by other departments, such as customer support. Many of the iMacs that would still power on had problems with cracked screens, bad video drivers, unusable usb slots, or faulty bluetooth. These are blocking problems for someone using the machines daily, but since these problems don’t affect their ability to run tests, we decided to use them for our automation.
After our initial scavenging, we ended up with 16 machines in total. These machines were a mix of broken and retired iMac desktops, Mac towers, Mac Minis, and a couple of linux machines to round things out. We crammed them on an empty desk on the engineering floor, set up the necessary services, and kicked off automated tests. After troubleshooting a few issues we had our automation fleet up and running. Instead of doing everything on our local machines, we were able to save time and iterate more quickly. By taking this scrappy approach and repurposing these computers we were able to save on the cost of buying new machines. It also gave them a new life verifying the quality of our product instead of sitting on a shelf.
Expanding The Fleet
Now that we had a stable fleet of automation machines up and running, we needed to scale up. Our number of automated tests and the frequency of our tests runs continued to climb. To keep up with the growth in automation we needed more machines and an efficient way to add them to the fleet.
About a year ago, we removed some of the oldest machines and added 16 more iMacs, which brought our total to 33. Around this time, many employees started using laptops instead of iMac desktops. IT started to acquire a collection of Macbooks that had problems similar to those of the iMacs in our original fleet, like cracked screens or broken trackpads. These laptops were either out of warranty or didn’t qualify for warranty repairs, so we stepped in to give them a new home. We added 29 Macbooks to our fleet, bringing us to a total of 62 machines.
As our fleet continues to grow, we iterate and improve our methods of adding machines to it. Each machine requires certain services and configuration to run the automated tests. The first machines in our fleet required manual setup. We have since built an imaging server to create an image of a machine that has all the necessary components. We then apply this image to the new machines. Once the machines are re-imaged, we run a script for network configuration specific to each machine. After that, they are ready to go!
Running our automation on outdated or broken machines presented certain risks. There was a chance that unidentified problems with the machines would affect our automation. We needed a way to ensure that the testing machines were running properly so that we could trust the results that we were getting back from them.
To achieve this, we built a monitoring system that periodically checks the health of the testing machines. This system determines basic things like whether the machine is actually alive and if someone can SSH into it. It also checks that necessary services are running and that the Selenium web driver is working properly and can navigate through pages in different browsers. The monitoring system will take action based on the results of the health check. It will automatically restart services, pull a machine out of the pool, or alert the appropriate person if a machine is not functioning properly. This monitoring system allows us to be scrappy while at the same time minimizing the risk of using repurposed machines.
Cross-browser testing is an interesting problem because each browser has its own requirements and quirks. We covered Firefox and Chrome with the initial fleet, but other browsers had to be tested manually. Based on browser usage statistics of Qualtrics users, we decided to focus first on Internet Explorer (IE) and mobile Safari. IE is particularly tricky because our testing machines are all Macs, and IE only runs on Windows.
We considered using third party services to perform the cross-browser testing. However, as our number of tests continued to grow, these options would become very expensive. Instead, we decided to run all browser tests on the same machines already running our Firefox and Chrome tests. We have scripts that download and provision a Windows virtual machine to run the IE tests. For iOS we implemented a proof of concept using the Mac native iOS simulator and an open source mobile testing framework called Appium. Our implementation is working and we will be deploying it to all of our testing machines soon.
Nailed it, Now Scale it
The Quality Engineering team faced many challenges when we first began building our automation fleet. We approached these challenges with scrappy solutions that accomplished our goals. We can now easily add new machines to our fleet, maintain and monitor them with ease, and use them to ensure solid cross-browser coverage. It took many iterations to get what we needed, but we were rewarded with a scalable and reliable system.
At our current pace, the scalability of our automation fleet cannot rely on laptops accidentally damaged by Qualtrics employees. However, the scrappy approach that let us try different things with little risk and big rewards is still paying off. We can now easily scale our monitoring and testing solutions to satisfy the testing needs of our engineering organization. We are gradually removing the old machines from our fleet and expanding it with brand new Mac Minis. Moving forward, Qualtrics will continue to grow, the requirements for test automation will expand, and our automation fleet will continue to scale with that growth.