Build Isolation at Qualtrics
One of the key goals of build engineering is to create repeatable builds and one of the ways to help achieve that is by isolating builds. As you are all well aware, builds take on a familiar cycle, regardless of language, framework, or platform:
- Source code is written by developers
- Code is transformed one or more times (aka compilation)
- Unit tests are executed
- Artifacts are packaged
- Functional tests are executed against those packages
- Packages are put into some repositories.
The environment in which all this takes place may vary not only from one project to the next but also from one execution to the next. Anyone familiar with execution variations has surely encountered gnarly bugs and strange behavior that can occur as a result. Ideally, these issues are caught before deploying to production, but occasionally such problems can escape to impact customers. This is a short story of how Qualtrics isolates builds to reduce these occurrences.
The story starts with the build server. At Qualtrics, developers work primarily on Mac laptops while the production machines run Centos. Qualtrics deploys Jenkins as a build server which executes builds by default on machines which are effectively the same as production machines. The operating system, libraries and tools on the build machines are controlled by the same mechanism as on production hosts. The build server not only provides automation, but also provides isolation by executing all builds in similar production-like environments. Releases and the official state of a build are now arbitrated by the independent Jenkins hosts and not by developers and their Macs.
However, there are some issues. First, builds are not isolated from one another. Consider a build which installs or changes a system library or tool (e.g. gcc, javac, thrift, protoc, etc.). This can change the behavior of a later build on the same machine of other projects - and it certainly will create variations across build machines! Second, the build machines are only as consistent the process that maintains them. Entropy being what it is, over time some of our machines can drift from each other as well as from the ideal. This can occur either through unintended or accidental changes to the machine or from the ideal moving forward and some machines being neglected or forgotten. One way to remedy this is to provision your Jenkins slaves on-demand and destroy them after each build -- but this is a little heavy handed.
Instead, we use a lighter-weight solution to isolate the build environment: Docker. We define and build a Docker image which is run to execute the build. All build dependencies are encompassed in that Docker image. The Docker image is versioned (do not use latest) and each project declares the version of the build image to use. This isolates the project’s build from any changes to the underlying Jenkins host with respect to libraries and tools. Further, any side-effects of the build are isolated to the running container and discarded when the build completes. Jenkins still provides the automation and ensures consistency through its operating system version and the version of the Docker daemon.
The final step was introducing Vagrant to our build story. This enables us to provide build isolation on developer Macs as well. When we started this story, Docker for Mac was not yet available. To work around this, Vagrant launches a virtual machine that looks like a Jenkins host (without Jenkins) and allows developers to execute the same Docker builds. Together Jenkins, Docker, and Vagrant have unified the build landscape resulting in isolated builds which are repeatable and contain consistent well-declared dependencies.
However, the story doesn’t end here. The stability provided through these build tools lead to a proliferation of build scripts across our projects. The next section discusses how we provide a generic build process across languages and frameworks.
Builder Bootstrap Script
In order to unify the build process and environment we created a build bootstrap script. This script is not a new build tool. The script is for launching the project’s existing build tool into an isolated runtime environment using Docker and Vagrant. Consequently, the same script is used across all projects, languages and frameworks and provides configuration hooks to allow for customization on a per-project and per-environment basis. Any arguments to the script are passed to the project’s build tool. For example:
- Without builder:
mvn clean verify
- With builder:
./build clean verify
The script automatically detects most of its configuration, such as the build tool, but all settings may be overridden. Specifically, configuration per project is specified via a `.build` file and configuration per environment through environment variables prefixed with `BUILDER_`. There are two required configuration options:
DOCKER_IMAGE- The name of the Docker image in the registry to run the build in.
DOCKER_REGISTRY- The uri of the Docker registry to use.
If the local machine does not have a running Docker daemon, then the build script will rerun itself inside Vagrant and then start the build inside of Docker. Otherwise, the build script starts the build tool in native Docker. For reference, the Vagrantfile we use looks like this.
The build script accepts these optional configuration options:
BUILD_TOOL- The command to execute for the build. Overrides any auto-detected build tool.
BUILD_ARGUMENTS- The arguments to pass to the build tool. Overrides any auto-detected arguments.
ADDITIONAL_BUILD_ARGUMENTS- Additional arguments to pass to the build tool in addition to auto-detected arguments.
BUILD_ENVIRONMENT- The environment key-value pairs to set. Overrides any auto-detected environment variables.
ADDITIONAL_BUILD_ENVIRONMENT- Additional environment key-value pairs to set in addition to auto-detected environment variables.
VAGRANT_EXPOSED_PORTS- Ports to map out of Vagrant.
DOCKER_EXPOSED_PORTS- Ports to map out of Docker.
REQUIRE_VAGRANT- Always execute in Vagrant even if Docker is available.
FORCE_NATIVE- Never execute in Vagrant or Docker.
The build script recognizes common build tools and provides reasonable configurations for each. The script linked below supports Maven, Gradle, Activator and Node. For example, if the script detects a `pom.xml` file it will assume the project uses Maven as its build tool (e.g. `BUILD_TOOL="mvn"`).
One of the common pitfalls of build isolation is degraded performance. Specifically, builds often cache dependencies and intermediate artifacts which would normally be discarded with full isolation. The builder script mounts out the artifact caches used by the specific build tools in order to improve performance. However, we ensure that each project has its own cache to maximize isolation. While this strategy uses much more disk space, it maximizes isolation while minimizing the performance penalty. While the resulting performance is reasonable for automated builds in Jenkins, developers typically have far less patience and we allow them to execute the builds locally without isolation should they so choose with the `FORCE_NATIVE` option.
You can be download the script here, and while you can include a copy in each project, we recommend that the build bootstrap script itself be bootstrapped into projects with a small wrapper which allows it to be versioned, downloaded and cached from a repository like Artifactory. This last step is left as an exercise for the reader.
The build bootstrap script has helped Qualtrics replace hundreds of lines of bash script per project while providing consistent isolation across projects. The fact that it is not a new build tool has made it a hit with developers, while the isolation consistency that it brings has quietly removed a large class of problems from our development cycle.