- XM for Customer Frontlines
  - Digital
    Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service
  - Care
    Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve
  - Locations
    Increase revenue and loyalty with real-time insights and recommendations delivered straight to teams on the ground
- XM for People Teams
  - Engage
    Know exactly how your people feel and empower managers to improve employee engagement, productivity, and retention
  - Lifecycle
    Take action in the moments that matter most along the employee journey and drive bottom line growth
  - Analytics
    Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people
- XM for Strategy & Research
  - Research
    Get faster, richer insights with qual and quant tools that make powerful market research available to everyone
  - User Experience
    Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts
  - Brand
    Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market
- XM Platform
  Explore the platform powering Experience Management
  - Free Account
  - Watch Demo
- Teams
  - For Digital
  - For Customer Care
  - For Human Resources
  - For Researchers
  - All Teams
- Industries
  - Healthcare
  - Education
  - Financial Services
  - Government
  - All Industries
- Popular Use Cases
  - Customer Experience
  - Employee Experience
  - Employee Exit Interviews
  - Net Promoter Score
  - Voice of Customer
- - Free Account
  - Watch Demo
- Customer
  - Customer Success Hub
  - Product Documentation
  - Training & Certification
  - Community
  - XM Institute
- Learn
  - Popular Resources
  - Customer Stories
  - Blog
  - XM Knowledge Base
- Company
  - About Us
  - Careers
  - Partnerships
  - Marketplace
  - X4 Summit
    The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.
- - Free Account
  - Watch Demo
CONTACT US
LOGIN
SUPPORT

Backend Development

Writing a Framework for Custom ETL Automations (pt. 1 of 2)

Dennis Callanan // September 24, 2019 // 12min read

Every customer has their own programs, technologies, business goals and requirements to fulfil. Meeting these requirements often means making our systems work in harmony with theirs through Extract, Transfer, Load (ETL) automations, where the output of one system feeds into the input of another. CRM is an example of a system which exchanges data with the Qualtrics platform. It can be challenging and expensive to write these custom integrations on a per-client basis. However, good engineering is all about identifying and abstracting the high-level similarities and offering flexibility for the low-level custom elements of a workflow.

In this post (and the next) we’re going to talk about the philosophy and design of the framework we developed internally for optimizing and streamlining the creation of custom ETL automations for clients.

Given the frequency of client requests, Engineering Services (EMEA) didn’t want to keep reinventing the wheel every time custom work was required (engineers hate that!). We wanted a robust, solution that was streamlined, could be extended for custom functionality, could be well maintained, and could enforce good standards. We wanted it to be developed to run reliably and predictably making life easier for different teams, engineers, partners, solution architects, and on-call engineers.

For part 1 I’m going to talk about about the many decisions that were made in order to achieve this goal. Specifically, I’ll focus on on architecture (AWS / Terraform), design (release engineering, project structure and design patterns), development philosophy, and automation.

BACKGROUND:

An ETL automation, in the more specific context of Qualtrics Engineering Services, is:

An automation which takes input from a client, processes data through the Qualtrics systems and provides an output to the client, such that the client’s system and data model can integrate seamlessly with ours

Below is one such example.

Many automations follow this kind of workflow (such as response imports, metadata imports, etc), but most of them require non-trivial custom engineering at some point. This custom engineering may involve parsing and validating the input file, enriching the data, mapping the data (from an old vendor to Qualtrics), conditionally scheduling distributions, conditionally sampling contact data, and more. These types of automations, with custom code sprinkled throughout, were the target for this framework: we wanted to focus on custom automations that were beyond our ability to simply define through basic configuration files.

Ultimately, we wanted to use this framework to automate the generation of boilerplate projects (full source code repositories) so as to guide the developers into good standards and practices: encompassing infrastructure as code, binary/config packaging and release engineering.

REASONING:

So why impose certain standards for project structure and source code methodologies? First, we want to ensure consistency across projects; consistency leads to simpler maintenance, less ramp-up for handovers, faster comprehension for on-call engineers, and stronger vision and direction to guide projects to completion. Equally as important as consistency is speed: we wanted to cut design and development time from the induction of these projects.

And why a framework? Given the conceptual similarity of many ETL automations, it’s important to define these workflows carefully and abstractly with default implementation for the most common use cases. This way engineers do not have to repeat any tasks, and duplicate code isn’t scattered across project repositories. But we want to give developers the freedom to extend this default functionality, if needed. This has many benefits:

It gives engineers the ability to engineer the parts of the solutions specific to their problem without having to reinvent the wheel
It allows partners to easily develop solutions for clients, while enforcing good practices through a streamlined process
It abstracts most of the common implementations which interact with Qualtrics APIs. This can be maintained to a high standard by Qualtrics Engineers (and if hypothetically API interfaces change, we need only reflect this update in one place).
This streamlined process (through the use of the framework and project generator) will allow us to reduce design time for projects as well as reduce scoping time for solution architects, as the areas of custom development are more clearly determined by the state of the framework. This in turn leads to confidence and quick response time for client ETL requests.

Why did we choose to build this framework on top of AWS? In Qualtrics Engineering Services we use a standalone infrastructure model which provisions new AWS cloud environments on a per-client basis. This has a few benefits:

It allows us to isolate infrastructure clearly for each independent client project
It is easy to observe usage of computational resources and safely determine a pricing model
It is horizontally scalable, secure, and requires less maintenance
It allows developers to easily spin up infrastructural components which lend themselves well to such automations (e.g pipes of queues and microservices for ETL processes and retry logic).

DESIGN:

Now, let’s take a look at the high-level design and architecture of this framework. To do this we’ll investigate the common workflows of ETL automations in more depth. An ETL Automation generally consists of the following stages:

Pre-processing of data (e.g. parsing and validating from source)
Processing of data (e.g. posting or retrieving data from API endpoints within the Qualtrics system)
Post-processing of data (e.g. constructing responses and returning these to the client in some format)

The pre-processing stage may involve pulling the source data from a client’s SFTP Server (or S3 Bucket), or consuming a client’s POST request to an API endpoint. This stage subsequently parses, validates and formats this data, pushing it to the processing stage of the workflow. The post-processing stage essentially performs the same steps in the reverse direction (as an export to the client rather an import from the client).

In between, we have the processing stage, which uses the formatted data to perform tasks which interact with Qualtrics systems. An example of a step in this stage would be to import contacts into a mailing list. This takes contacts as an input, and outputs a mailing list ID. The next processing step would then take this mailing list ID as an input, schedule a distribution to that mailing list, and output the distribution ID.

It’s good practice to have a single Qualtrics API call associated with a single processing step. Why is this? Well, if one step fails using the output of a previous step (which was successful), we want to retry only the step that failed which will allow for idempotency across the general workflow, when including retry logic.

To make this concept more concrete, we will represent each step discussed above as an independent and well defined service in AWS. Each step consists of a microservice (Lambda) which processes the request input, and outputs the result to the next step (i.e next service).

Lambdas are very powerful for this use case.

They are serverless and so require no maintenance, meaning developers do not need to worry about the services they consume in their projects
They are easy to deploy and develop (through SLS)
They are highly scalable and have retry logic, while having a simple relationship with queues (we’ll get there later)
They conceptually represent the “pipe” workflow ETL automations are familiar with
Tasks can be delegated to a swarm of asynchronous lambdas for batch processing
Simple configuration can allow for synchronous consumption of requests from queues (for hitting non-concurrent APIs), or unlimited reserved concurrencies (for asynchronous APIs)

There are a few general pitfalls with Lambda however:

The serverless nature means our automations should ideally not depend on databases, as up to 1000 asynchronous lambdas could fire off. Caching etc is not a realistic option. Lucky for us, we are just hitting our Qualtrics APIs (which take on all the heavy load) - so this is not a concern.
Lambda functions may retry unintentionally. Therefore, each Lambda function should be idempotent. Once again, lucky for us, our APIs are built to handle this. For example, our Response Import API allows the user to pass an idempotency key to prevent importing identical responses in the case of an unintended retry. See Response Import docs.

In an ideal world we could join up microservices like this:

The problem with the above example, though, is that it doesn’t account for failed steps. For example, how our ETL behaves if the “Import Contacts” step failed? The Qualtrics API in which the Contact Importer step depends on may have high availability, but there is always the possibility of failure. Even worse, these automations often run on similar cron schedules hitting the same API endpoints, which can only increase the likelihood of failure. We can incorporate retry logic for each Lambda, but that doesn’t really scale laterally because of the aforementioned cron problem. So what do we do? In this case, we introduce queues (SQS), because queues can:

Manage the batch sizes as data flows through the steps of the workflow
Delegate batches to multiple asynchronous microservices
Use DLQs (Dead letter Queues) for simpler alerting and management of failed automations for on-call engineers without killing the entire automation

In summary, our high-level visualization of services in relation to our example workflow looks something like this:

As we can see each service is represented by a Queue and a Microservice. This encompasses a single block / function, a reusable component which we can develop and maintain independently and inject into any project or workflow.

In this way, a developer can decide whether to incorporate certain services in their workflow. Do they need a data validator? Do they need a distribution scheduler? Do they need a response importer? Do they need to schedule multiple distributions at different times? Etc.

Finally we want to be able to make these services extensible. To do this each microservice (lambda) imports an abstract library which is maintained internally by Qualtrics. The classes / modules available through the library contain methods with default functionality, documented and exposed to the developer, such that they can be overridden to suit the developers requirements. These protected methods should have well defined interfaces and conceptually solid use cases to guide the developers properly. We’ll talk about this more in the implementation section.

Overall, this service-oriented architecture is the foundation of the framework.

In this post we talked about the design of a framework for developing custom ETL automations, and the reasoning behind it. In part 2 we'll dive deep into the more nitty gritty implementation details, in particular, Terraform and Serverless.

Topics architecture AWS infrastructure lambda queue sqs terraform

Dennis Callanan

September 30, 2019

Backend Development

Writing a Framework for Custom ETL Automations (pt. 1 of 2)

Related Articles

Writing a Framework for Custom ETL Automations (pt. 2 of 2)

Using Limits to Scale Efficiently

Lessons learned from a large-scale data migration

How to Think Full-Stack

Indexing Text for Both Effective Search and Accurate Analysis

Docker Exec and Maven AppAssembler

Intern Project: Creating a Global Search using Solr

A Node developer’s perspective on Java promises

Support

Company

Resources