Boosting Big Data workloads with Presto Auto Scaling

The Data Engineering team at Eventbrite recently completed several significant improvements to our data ecosystem. In particular, we focused on upgrading our data warehouse infrastructure and improving the tools used to drive Eventbrite’s data-driven analytics.

Here are a few highlights:

  • Transitioned to a new Hadoop cluster. The result is a more reliant, secure, and performant data warehouse environment.
  • Upgraded to the latest version of Tableau and migrated our Tableau servers to the same  AWS infrastructure as Presto. We also configured Tableau to connect via its own dedicated Presto cluster. The data transfer rates, especially for Tableau extracts, are 10x faster!
  • We upgraded Presto and fine-tuned the resource allocation (via AWS Auto Scaling) to make the environment optimal for Eventbrite’s analysts. Presto is now faster and more stable. Our daily Tableau dashboards, as well as our ad-hoc SQL queries, are running 2 to 4 times faster.

This post focuses on how Eventbrite leverages AWS Auto Scaling for Presto using Auto Scaling Groups, Scaling Policies, and Launch Configurations. This update has allowed us to meet the data exploration needs of our Engineers, Analysts, and Data Scientists by providing better throughput at a fraction of the cost.

High level overview

Let’s start with a high-level view of our data warehouse environment running on AWS.

Auto Scale Overview

Analytics tools: Presto, Superset and Tableau

We’re using Presto to access data in our data warehouse. Presto is a tool designed to query vast amounts of data using distributed queries. It supports the ANSI SQL standard, including complex queries, aggregations, and joins. The Presto team designed it as an alternative to tools that query HDFS using pipelines of MapReduce jobs. It connects to a Hive Metastore allowing users to share the same data with Hive, Spark, and other Hadoop ecosystem tools.

We’re also using Apache Superset packaged alongside Presto. Superset is a data exploration web application that enables users to process data in a variety of ways including writing SQL queries, creating new tables and downloading data in CSV format. Among other tools, we rely heavily on Superset’s SQL Lab IDE to explore and preview tables in Presto, compose SQL queries, and save output files as CSV.

We’re exploring the use of Superset for dashboard prototyping although currently the majority of our data visualization requirements are being met by Tableau. We use Tableau to represent Eventbrite’s data in dashboards that are easily digestible by the business.

The advantage of Superset is that it’s open-source and cost-effective, although we have performance concerns due to lack of caching and it’s missing some features (triggers on charts, tool-tips, support for non-SQL functions, scheduling) that we would like to see. We plan to continue to leverage Tableau as our data visualization tool, but we also plan to adopt more Superset usage in the future.

Both Tableau and Superset connect to Presto,  which retrieves data from Hive tables located on S3 and HDFS commonly stored as Parquet.

Auto scaling overview

Amazon EC2 Auto Scaling enables us to follow the demand curve for our applications, and thus reduces the need to manually provision Amazon EC2 capacity in advance. For example, we can use target tracking scaling policies to select a load metric for our application, such as CPU utilization or via the Presto metrics.

It’s critical to understand the terminology for AWS Auto Scaling. Tools such as “Launch Configuration,”  “Auto Scaling Group” and “Auto Scaling Policy” are vital components we show below. Here is a diagram that shows the relationship between the main components of AWS Auto Scaling. As an old-school data modeler, I tend to think in terms of entities and relationships via the traditional ERD model 😀

Auto Scaling ERD

Presto auto scaling

We’re using AWS Auto Scaling for our Presto “spot” instances based on (I) CPU usage and (II) number of queries (only used for scaledown). Here is an overview of our EC2 auto-scaling setup for Presto.

Auto Scaling with Presto

Here are some sample policies:

Policy type:  Simple scaling (I)

Execute policy when:  CPU Utilization >= 50 for 60 seconds for the metric dimensions .

Take the action:  Add 10 instances (provided by EC2).

Policy type: Simple scaling (II)

Execute policy when: running Queries <= 0 for 2 consecutive periods of 300 seconds for the metric dimensions.

Take the action: Set to 0 instances.

Note: A custom Python script was developed by Eventbrite’s Data Engineering team to handshake with Cloudwatch concerning scaledown.  It handles the race condition where another query comes in during the scaledown process. We’ve added “termination protection” which leverages this Python script (running as a daemon) on each Presto worker node. If it detects a query is currently running on this node, then it won’t scale down.

Tableau scheduled actions

We’re using “Scheduled Scaling” features for our Tableau Presto instances as well as our “base” instances used for Presto. We scale up the instances in the morning and scale down at night. We’ve set up scheduled scaling based on predictable workloads such as Tableau.

“Scheduled Scaling” requires configuration of scheduled actions, which tells Amazon EC2 Auto Scaling to act at a specific time. For each scheduled action, we’ve specified the start time, and the new minimum, maximum, and the desired size of the group. Here is a sample setup for scheduled actions:

Auto scale actions


We’ve enabled Auto Scaling Group Metrics to identify capacity changes via CloudWatch alarms. When triggered, these alarms will cause autoscaling groups to execute the policy when a threshold is breached. In some cases, we’re using EC2 alerts and in others, we’re pushing custom metrics through python scripts to Cloudwatch.

Sample Cloudwatch alarms:

Multiple Presto clusters

We’ve separated Tableau connections from ad-hoc Presto connections. This abstraction allows us to separate ad-hoc query usage from Tableau usage.


Our Presto workers read data that is written by our persistent EMR clusters.  Our ingestion and ETL jobs run on daily and hourly scheduled EMR clusters with access to Spark, Hive and Sqoop. Using EMR allows us to decouple storage from computation by using a combination of S3 and a custom HDFS cluster. The key is we only pay for computation when we use it!

We have multiple EMR clusters that write the data to Hive tables backed by S3 and  HDFS. We launch EMR clusters to run our ETL processes that load our data warehouse tables daily/hourly. We don’t currently tie our EMR clusters to auto-scaling.

By default, EMR stores Hive Metastore information in a MySQL database on the master node. It is the central repository of Apache Hive metadata and includes information such as schema structure, location, and partitions. When a cluster terminates, we lose the local data because the node file systems use ephemeral storage. We need the Metastore to persist, so we’ve created an external Metastore that exists outside the cluster.

We’re not using the AWS Glue Data Catalog. The Data Engineering team at Eventbrite is happy managing our Hive Metastore on Amazon Aurora. If something breaks, like we’ve had in the past with Presto race conditions writing to the Hive Metastore, then we’re comfortable fixing it ourselves.

The Data Engineering team created a persistent EMR single node “cluster” used by Presto to access Hive. Presto is configured to read from this cluster to access the Hive Metastore. The Presto workers communicate with the cluster to relay where the data lives, partitions, and table structures.

The end

In summary, we’ve focused on upgrading our data warehouse infrastructure and improving the tools used to drive Eventbrite’s data-driven analytics.  AWS Auto Scaling has allowed us to improve efficiency for our analysts while saving on cost.  Benefits include:

Decreased Costs

AWS Auto Scaling allows us to only pay for the resources we need. When demand drops, AWS Auto Scaling removes any excess resource capacity, so we avoid overspending.

Improved Elasticity

AWS Auto Scaling allows us to dynamically increase and decrease capacity as needed. We’ve also eliminated lost productivity due to non-trivial error rates caused by failed queries due to capacity issues.

Improved Monitoring

We use metrics in Amazon CloudWatch to verify that our system is performing as expected. We also send metrics to CloudWatch that can be used to trigger AWS Auto Scaling policies we use to manage capacity.

All comments are welcome, or you can message me at Thanks to Eventbrite’s Data Engineering crew (Brandon Hamric, Alex Meyer, Beck Cronin-Dixon, Gray Pickney and Paul Edwards) for executing on the plan to upgrade Eventbrite’s data ecosystem. Special thanks to Rainu Ittycheriah, Jasper Groot, and Jeremy Bakker for contributing/reviewing this blog post.

You can learn more about Eventbrite’s data infrastructure by checking out my previous post at Looking under the hood of the Eventbrite data pipeline.

Automated Cross-Browser Testing for WebGL— It’s Not Going to Happen

Apologies to the folks who found this post while searching for “automated WebGL testing,” “how to write cross-browser WebGL tests,” or similar. I’ve been there, and it is not my favorite part of the job. Sadly I do not know a magic recipe for writing cross-browser acceptance tests for web apps that integrate WebGL canvas interactions as part of a larger user flow. This post offers a look into how the Reserved squad at Eventbrite uses Rainforest QA to test complex WebGL flows.

I’m a frontend software engineer on the Reserved squad, which recently (at the time of writing) launched an end-to-end experience for reserving seats within Eventbrite’s embedded checkout flow. While we were developing this feature, we ran into a roadblock: how could we write reliable acceptance tests for our WebGL-dependent flows? Furthermore, how could we reliably test our user flows without sinking hundreds of additional engineering hours into coercing Selenium to click on the precise canvas coordinates necessary to reserve a seat? We decided to try testing some of our user flows with a crowdsourced quality assurance (QA) platform called Rainforest QA, and have been quite happy to ship the results.

WebGL: What it’s good at, and one unfortunate consequence

WebGL is useful for rendering complex 2D and 3D graphics in the client’s web browser. It’s natively supported by all major browsers and under the hood interfaces with OpenGL API to render content in the canvas element. Because it allows code to run in the client’s GPU, there are significant performance benefits when you need to render and listen to actions on hundreds or thousands of elements.

My squad at Eventbrite uses WebGL (with help from Three.js, which you can learn more about in an earlier blog post) to render customizable venue maps that allow organizers to determine seat selling order. Once the organizer publishes the event, we allow attendees to choose the location of their seat on the rendered venue map. Because WebGL draws the venue maps in the canvas element rather than needlessly generating DOM elements for every seat, we can provide a relatively performant experience, even for maps with tens of thousands of seats. The only major drawback is that there is no DOM element to target in our acceptance tests when we want to test what happens when a user clicks on a seat.

The code to render a seat map using Three.js looks roughly like this:

// Initialize scene, camera values based on client browser width
const {scene, camera} = getSceneAndCamera();
const element = document.getElementById('canvas');
const renderer = new THREE.WebGLRenderer();

// Add objects like seats, stage, etc. to the scene, then render it
renderer.render(scene, camera);

This code renders content in the canvas element:

But when we inspect the generated markup, this is all that we see:

<canvas width="719" height="656"></canvas>

Because the canvas element does not contain targetable DOM elements, simulating a seat click using WebDriver or other test scripting frameworks requires specifying exact coordinates within the canvas element.

How did Rainforest solve our testing problem?

For several months, my squad had been working in a green pasture of unreleased code as we made steady progress on new pick-a-seat features. Throughout the development process, we maintained test coverage with unit tests, integration tests, and page-level JS functional tests using enzyme and fetch-mock. However, our test coverage contained a glaring hole: we had not yet written tests that fully verified our user stories.

Acceptance tests are black-box tests that formally describe a user story and that we run at the system level. An acceptance test script might load a URL in a virtual machine (VM), automate some user actions, and confirm that the user can complete a flow (such as checkout) as expected. Eventbrite engineers rely on acceptance tests to ensure that our user interfaces don’t break when squads across the organization push code to our shared, continuously deployed repositories. Most acceptance tests at Eventbrite are written using Selenium WebDriver and often look something like this:

    def test_checkout_widget_free_event(self):
        """Verify it is possible to purchase a free ticket."""
        # Go to the test page

        # Select a ticket and click the checkout button
        self.checkout_widget.select_ticket_quantity(, 1)

        # Verify the purchase confirmation page is displayed

But when targeting a canvas element, clicking on a seat looks a bit more like this:

   action = ActionChains(webdriver_instance)
   action.move_by_offset(seat_px, seat_py)

In other words, we need to know the exact x and y coordinates of the seat within the canvas element. Even after the chore of automating clicks on precise coordinates within the canvas, we knew that minor style changes might require us to revisit each test and hunt down updated coordinates.

As the projected release date loomed near, we considered our options and determined that it would require several dedicated sprints to write the tests needed to thoroughly cover all of our new features. What if, instead of wrangling data and coordinates, we could write out test plans that could be quickly verified by human QA testers?

Enter Rainforest! Rainforest is a crowdsourced QA solution that puts our flow in front of real users. Because testers access sessions through a VM, we can specify which browsers they need to test, and they can run the tests against our staging environment. The Rainforest app runs the test suite on a customizable schedule, and the entire test run is parallelized and completed in less than 30 minutes. We wrote out all of our as-yet-untested user story test cases (in plain English) and got the system up and running.

Our Rainforest tests look like this:

We write each step of the test as a direction, followed by a yes-or-no question for the tester to answer. During a testing session, the tester follows the instructions, such as: “Click ‘Buy on Map’ located on the right-hand side.” Next, they mark the step as passed if the click caused the rendered map to zoom to the two highlighted seats.

Our key to Rainforest success: one-step event creation

Once we decided to proceed with this approach, our squad invested some time into developing an API that would allow us to automate a critical step of this workflow. When Rainforest testers log into their VMs, we provide them a URL that will, upon load, create a new QA user account with an event that is in the exact state needed to test the features covered by the test. A tester loading this URL is analogous to an acceptance test run instantiating the factory classes that generate test data for our WebDriver tests.

The endpoint accepts URL parameters that define relevant features of the event:


Loading this URL creates a new QA user with restricted permissions, builds an event with a medium-sized seat map and four ticket types (authored by the new user), and then redirects to the embedded checkout test URL for the given event.

Without this tool, Rainforest testing would require a manual tester dozens of clicks and page refreshes to create an event, design a venue map, publish the event, and then finally reach the checkout flow. Eventbrite engineers have already covered all of these actions with automated acceptance tests elsewhere—when we are testing the seat reservation flow, we want to focus on precisely that. One-step event creation has allowed us to get testers into the correct state to access our flow with a single keystroke.

Additionally, because we have configured Rainforest to run against our staging environment, Rainforest QA testers catch bugs for us before they are released. While unit and integration tests give us confidence that our code works at a more granular level, Rainforest has given us an additional layer of security, assuring that the features we already built are still working so that we can move on to the next challenge.

Universal takeaways

Yes, Rainforest does cost money, and I’m not here to tell you how your company should spend its money. (If you’re curious about Rainforest, you can always request a demo). It’s also not the only solution in this space. Rainforest works very well for us, but a related platform such as Testlio, GlobalAppTesting, TestingBot, or UseTrace may be a better fit for your team.

Here are some takeaway learnings from our case study that might still come in handy:

  • Cross-browser testing pays off. If your current acceptance suite only runs tests against one browser, it might be worth re-evaluating. (If you’re doing your own cross-browser QA, Browserstack is indispensable.)
  • When you automate testing user stories as part of your continuous integration (CI) flow, you ensure that your system reliably meets product requirements.
  • Don’t stop writing automated tests, but do consider how much time you are spending writing and maintaining tests that could be more reliably tested by a human QA tester.
  • You can get the most out of your testing and QA by automating critical steps of the process.

For my squad, Rainforest has been an excellent solution and has helped us catch many browser-specific and complex multi-page bugs before they made their way to the release branch. While we are still working on improving its visibility in our CI flow so that newly introduced bugs are surfaced earlier in the development cycle, automated test runs assure us that our features remain stable across all major browsers. As a developer, I love that I get to spend my time building new features rather than writing and maintaining fussy WebDriver tests.

Have you found another way to save time writing acceptance tests for complex WebGL flows? Do you have questions about our Rainforest experience that I didn’t cover? Do you want to have a conversation about the ethics of crowdsourcing QA work? Let me know what you think in the comments below or on Twitter.

Varnish and A-B Testing: How to Play Nice

Here at Eventbrite, we love building sites that are fast, delightful, and reliable. Caching HTML responses using edge caches, such as Varnish, ensures a lighter load on your servers and a performant experience for the end user. However, doing so can often cause A/B testing frameworks to fail in a sneaky fashion.

Read on to learn some key things to know if you find yourself running an A/B test on a page served via Varnish.

First, a Quick overview

What is A/B testing? The Wikipedia page covers the topic well, but here’s a quick TL;DR: A/B testing allows us to expose our users to two slightly different experiences: a control and a variant, where the variant only differs in a singular controlled manner. Then track each variant by pre-determined performance metrics, such as conversion rate to purchase, to decide if the variant provides a real lift over the control.

A/B testing is one of the most useful tools a developer and product managers can use to determine what engages with their audience the best. Often these tests need to live on pages that must be reliable and must be performant. That’s where Varnish comes in.

Varnish is an open sourced caching HTTP reverse proxy. Essentially, a super fast cache, which sits in front of any server that understands HTTP. It receives requests from the client and attempts to serve an HTTP response from the cache. If it cannot, it then forwards the request to the backend server, stores the server’s response and pass it along to the client.

Varnish sounds great! Why is it troublesome with A/B Testing?

Varnish caches an entire HTML response, so some requests from the client never hit any server-side application code. If the A/B testing framework assigns variants on the server or relies on any server-side logic, then a person enrolled in variant A may be served a cached response of variant B (and vice versa). This is bad. The experiment data becomes corrupt, and any potential insights are useless. If the A/B test is entirely separate from any backend logic code, there may not be any problem at all!

What is the Solution?

Utilizing Edge-Side Includes (ESI) with our Varnish layer!

ESI is a small markup language that allows for the dynamic web content assembly. It provides an edge server (like our Varnish cache) the ability to mix and match content (or fragments) from multiple cached URLs into a single response.

Let’s look at a simple example with a global header we want included via ESI on multiple pages:

//HTML file with ESI Include
        <esi:include src=”/my_global_header.html” />
         <div>Lots of other content</div>

What is happening here?

The Varnish server understands how to parse the <esi:include and will see if it has the path dictated in src value cached.

On a hit (the asked for item is in the cache): It inserts that cached fragment into the response our system returns to the client. The server did not have to do any additional work to create our global header again; rather, Varnish simply inserted the cached global header directly into the response.

On a miss (the asked for item is not in the cache): The cache checks back with the server and asks for content represented by the provided path. It then inserts that response into the cache using the src value as the key. Varnish then inserts the fragment into the response, and pass it along to the client.

Why not Varnish the whole page?

This way we can re-use the global header component on any number of templates, including those that may contain user-specific information which we should not serve via Varnish. It allows us to be surgical with what content we determine we want to cache, and that which we do not.

Applying ESI to our use case

We can utilize ESI to include an entire view, rather than just a fragment of a view, in such a way that we don’t impact performance negatively. Let’s run through an example.

Say we have a complicated homepage at Our server resolves incoming requests for /home to our view handler HomePageView which returns an HTML response. HomePageView does massive amounts of logic and heavy lifting to provide a great experience to our users. It receives heavy traffic, regularly, so we naturally serve it with Varnish to avoid such heavy lifting for every request.

However, our team has been asked to run an experiment on the homepage which would display a picture of a cool cat to users with an odd-numbered guest_id. Here guest_id is a semi-permanent identifier stored in a cookie for a logged out user.

We then can do the following:

  1. Remove any “standard” Varnish configuration that may have been implemented on the homepage to ensure that every single request hits the server. When a request comes from the client for, every single one should resolve to the HomePageView.

  2. Move all of the heavy logic that HomePageView was previously doing to a new view titled HomePageViewESI. We’ll come back to this in step 5.

  3. Now instead of the normal heavy logic our HomePageView previously did, we only parse the guest_id from the request. For purposes of the example, let’s say the guest_id is odd. The view then creates an ESI specific path that represents a homepage covered in cats:
    esi_path = my_esi/home/?my_experiment_variant=show_cats

    Aside: The esi_path here, acts as our unique cache key.

  4. Then the response which HomePageView returns from our application server is just the following:
    <esi:include src=”my_esi/home/?my_experiment_variant=show_cats” />

    That’s it. We don’t include anything else on the server response. Our varnish server understands how to parse the <esi:include, and if it is a hit, inserts the cached cat covered homepage specified by the provided esi_path. No application logic was necessary beyond parsing the guest_id to serve the correct content to the end user.

  5. However, what if the esi_path is a miss? Varnish will look back to our server, and request the content represented by the provided esi_path. Which looks like:

    Meaning that the server needs to resolve incoming requests for /my_esi/home/ in addition to /home.

    This is where we use HomePageViewESI. We configure the server to resolve incoming requests for /my_esi/home/ with HomePageViewESI.

    HomePageViewESI understands how to parse experiment variants encoded into the path, does the heavy lifting, and returns a full, complex, HTML response.

    Varnish consumes this rich HTML content, insert the returned content into the <esi:include tag HomePageView returned initially as a fragment, and store it in the cache under the key:


    This process guarantees that even cache hits serve the expected variant to a given user. The variant is encoded into the esi_path guaranteeing a unique cache key for each version of the content to be served.


This approach allows for the a/b testing of heavily trafficked, yet performant pages. Listed below are some “gotchas” to avoid!

Keep any logic done before returning the initial <esi:include very light.

This logic runs for every request. To hold onto the benefits that our cache provides us, be sure not to bloat this with extraneous logic.

The URL path in the browser does not match the path of the request itself.

On a cache miss, the server now receives a url prefixed with some ESI specific identifier, in our example, my_esi was used. This means it doesn’t match the URL represented by the browser.

For example, the browser’s URL may read:

<a href=""></a>

However, the URL path that the server is receiving is:

<a href=""></a>

This can quickly cause downstream issues. Many error loggers and other forms of reporting rely on the request path server side, but that will no longer be an accurate representation of the request put forward by the user. Instead, it will be the constructed ESI URL. Additionally, if the frontend stack relies on the request path or query params, it will no longer be in sync with what is in the browser for these same reasons.

Solutions? There are many. The core of each comes down to two things:

  1. Communication
  2. Abstraction

Which seem pretty counter to each other, huh?

The communication is inward.

It is easy for issues to arise when implementing complex caching solutions, so it is necessary to utilize verbose logging on any page that has ESI implemented for the response. Doing so allows for better ability to track down bugs that could otherwise be incredibly cryptic to decipher.

Always be sure to include the full path, with query params, in the backend logs for pages served via ESI. The query params provide necessary information as to exactly what response we served to the client.

The abstraction is outward.

It should never become apparent to the user that the request path is something different than what the browser represents as that would negatively impact their trust in the application.

How do we solve for this? If possible, remove any inclusion of the request path to your client, and instead rely on window.location. However, if your application is tied tightly to the request query params and path hydration, another option is to abstract your request on the server in an ESI aware way such that the critical elements needed represent the original request and not the path.

On a cache miss: Do not enroll a user when building the full view.

Often it is necessary to enroll users based on a specific set of conditions, those conditions, however, must be met outside of the ESI layer. Attempting to enroll users from within the built view of an ESI layer causes your data to quickly become unreliable, as there is no guarantee that the server will be hit for anything encapsulated within that view.

The solution is to perform any user enrollments on the outer-most layer which we call on every request before returning the <es:include src={} /> response and encode the value into the path provided to src as that is the only way to ensure that the data is correct.

All in all, implementing an ESI layer to solve for A/B testing Varnish Cached pages can be difficult and cause confusion; however, it often is the only way to test critical flows in a given application.

Have you ever had issues A/B testing with a cache? Let us know below! You can also ping me on Twitter @VincentBudrovic.

Photo by Christopher Burns on Unsplash

The Fundamental Problem of Search

There is a fundamental problem with search relevance. Users are unaware of their own internal processing as they search, modern search interfaces glean only sparse information from users, and ultimately it is impossible to definitively know what a user really needs. Nevertheless, search applications are expected to act upon this sparse information and provide users with results that best match their intent.

In this post, you will learn about these challenges through several examples. By understanding the blind side of search you can accommodate these challenges and provide your users with a better search experience.

The search problem space

A couple years back I co-authored “Relevant Search,” where I described the mechanics of information retrieval and how to build search applications that match users with the information they seek. But even as I wrote the book something at the back of my mind was weighing me down. Only now has the problem taken shape so that I can describe it. I call it the fundamental problem of search. Consider the following:

  • Modern search interfaces are minimalistic, and users don’t have much opportunity to tell you what they want – usually just a text box.
  • Users have lots of criteria in mind when making a decision. If they want to find an event, then they are considering the type of event,  location, time, date, overall quality, and probably many other things as well.
  • Different users have different criteria in mind when making decisions and different weighting of said criteria.
  • Users often don’t know they have all of these criteria in mind. They believe you can “simply” find a set of matching documents and return them in some “simple” specified order.
  • Users believe that deciding whether or not documents match their search criteria is a binary decision, and users believe that the order of the results can be exact. In truth, both the match and the ordering are naturally “fuzzy.”
  • Despite uncertain user intent and the fuzzy nature of matching and ordering, the relevance engineer has to make both matching and ordering concrete and absolute.

The fundamental problem of search, then, is the fact that relevance engineers are required to perform a fool’s errand. With missing information, with ambiguous information, and with high user expectations, we have to coerce a search engine to somehow return documents that match the user’s intent. Further, the documents are expected to be ordered by hopelessly ill-defined notions of quality and relevance.

In the sections below I’ll delve into several examples of the fundamental problem of search as well as some ideas for rising above the problem. Here’s a hint though: it won’t be easy.

Matching and sorting by relevance

In the simplest possible scenario, the user enters text into a search box, and we match documents and sort the results based solely on relevance. That is, we find all the documents that contain at least one of the user’s keywords, and we score the documents based upon Term Frequency Inverse Doc Frequency scoring (TF*IDF). This process seems pretty straightforward, right? But fuzziness is already starting to creep in.

First, let’s consider how to determine which set of documents match. If your user searches for “green mile” then we as humans recognize that the user is probably talking about the movie called The Green Mile. The search engine is just going to return all documents that match the term green or the term mile. However, documents that match only one of these terms probably isn’t going to be very relevant. One option is to require both terms to match. But this strategy is ill-advised because there are plenty of times where an either/or match might be relevant. If the user searches for “comedy romance” then they might prefer a comedy romance, but toward the end of the list, a comedy or romance film might be just fine.

In principle, another option would be to return every document with a score above some cutoff value X. In practice, this isn’t possible because TF*IDF scoring is not absolute; you don’t score 0 for perfectly irrelevant documents and 1 for perfectly relevant documents. Consider two queries, one for “it” (as in the movie It) and another query for “shawshank” (as in The Shawshank Redemption). The term “it” is very common and so the best matching document – the movie It – will likely get a relatively low TF*IDF score. However, in the case of “shawshank,” let’s say that we don’t actually have a document for The Shawshank Redemption, and the only document that matches is because of a footnote in the description stating “from the director of The Shawshank Redemption.” Though this is a poor match, the score will be quite high because the word “shawshank” is so rare. In this example, we have a low scoring document that is a great match and a high scoring document that is a terrible match. It is just not possible to delineate between matching and non-matching documents based upon score alone.

We see that even in the most basic text search scenario we already begin running into situations where we can’t know the user’s intent and where the hope of perfect matching and perfect sorting break down. But wait, it gets worse!

Matching and sorting by relevance and quality

“I want to find a documentary about Abraham Lincoln.” Seems simple enough. So, we retrieve all documents that match either abraham or lincoln, and we sort by the default scoring algorithm so that documents that match both terms appear at the top. However, there’s a problem: The user told you they want Abraham Lincoln documents but implicit in the request is that they really want just the high-quality results.

If you’re used to databases and if you’ve just started working with search, then the answer seems obvious – just sort by quality (or popularity or whatever field that serves as a proxy for quality). If you do this you’ll immediately find yourself with a new issue: when sorted by quality, the top results will contain documents that are high quality, but only match one of the terms and aren’t very relevant at all. If you had a documentary on the life and times of the Biblical Abraham for example and if it was a really high-quality documentary, then it would jump up above the documents that are actually about Lincoln.

So again, for someone new to search, the next “answer” is clear: just turn minimum_should_match parameter to 100% to ensure that we only return documents if they have all the terms that the user queries. But this doesn’t really fix the problem. Consider a high-quality documentary about Ulysses S. Grant which merely mentions Abraham Lincoln – a high-quality result, but nevertheless irrelevant to the user. What’s more, minimum_should_match=100% can get you in trouble when the user searches by dumping lots of words in the search box and hoping that some of them match. For example “civil war abraham lincoln” – a documentary entitled “President Lincoln and the Civil War” would be entirely relevant yet would not be a match!

The best thing to do here is to boost by quality rather than use absolute sorting. By default, the score of the document is based solely upon how well the text matches according to TF*IDF. You can incorporate quality into the score by adding in some multiplier times the quality: total_score = text_score + k * quality. With this approach, you can in principle adjust k so that the total score is the appropriate balance between text score sorting (k = 0) and absolute quality sorting (k = inf).

Though this approach of linearly adding in quality is a very commonly used approach and is often effective, it can come with some nasty problems of its own. Ideally, you would be able to find some k that works best in all cases. In practice, you can not. Refer back to the example of the “it” search and the “shawshank” search. In the “it” search, the best matching document will have a much lower text score than a typical query. And in the “shawshank” query, even average matching documents will have potentially high scores. In both of these cases if we calculate total_score as text_score + k * quality, then in the “it” search quality component have a much greater effect on sorting than it will for the “shawshank” query. It would be nice if somehow we could automatically scale k so that it was proportional to the general tests scores for a given search. More on this in a future post!

Sidebar: multiple objective optimization

A big part of the underlying theme here is that search is a multiple-objective optimization problem. That is, we are trying to optimize the scoring function so that multiple objectives are optimized simultaneously. However we do not know – and we can not know – how important the objectives are relative to one another.

The issue is perhaps most evident in applications like Yelp where the different objectives are called out in the application: You’re looking for a restaurant – how would you like to organize the results? Distance? Price? The number of stars? If you’ve selected a food category or typed in a search, then how important should that be? From Yelp’s perspective, the answer cannot entirely be known. The best we can do is to find some balance between the various dimensions that empirically tends to maximize conversion rates. In modern implementations of search, Learning-to-Rank is a machine learning approach that does precisely this.

Matching and sorting by relevance, quality, and date

Things get even worse when we involve precise quantities like date or price. Often users want to sort their results by date, and this sounds like a perfectly reasonable thing to do. However, you will encounter some terrible side effects when exact sorting by things like date. Here’s why: the Pareto principle drives the world, and your inventory is no different. If you are in e-commerce, then 20% of your inventory is where you get 80% of your sales, and 80% of your inventory is where you get 20% of your sales.

Let’s say our users are searching for “beer events,” but they want to sort the results by date. Rather than showing the most relevant events such as beer festivals, brewing classes, and beer tastings, we’re going to show them irrelevant, date-ordered events such as business dinners or speed dating events that merely mention beer in their descriptions. Effectively, we are scooping way down into the 80% of less desirable events simply because they happen sooner than the more relevant events that we should be returning.

Solving this is quite a challenge. Consider some alternatives:

  • Boost by date: As presented in the last section you can boost by date and make sure that the best documents are right at the top of the search results kinda sorted by date. But when users choose to sort by a precise quantity like the date, they will see any deviation from date order as evidence that search is broken and not to be trusted.
  • Re-sort the most relevant documents by date: You can use the Elasticsearch rescore feature to find a set of the N most relevant documents and then re-sort them by date. But how do you find a good value for N? If N is too low, then users may page past all N results and you’ll have to either tell them there are no more results OR you’ll have to “show omitted results” and start over by date. On the other hand, if N is too high, then the returned set will dip past the most relevant document and pull up some of the 80% of less desirable results. Sorting by this group means that some of these irrelevant or low-quality documents end up at the top of the search results.
  • Sort by date then sort by relevance: If you think this is a good idea, then you haven’t put your thinking cap on yet today. Nevertheless, I hear this tossed around as an option quite a bit. The problem is that if date includes a timestamp, then it is a continuous value. If your documents have timestamps with granularity down to the second then sorting by date followed by quality is no different than just sorting by date.
  • Bucket by date and sort by relevance within each bucket: As a variant on the previous idea, you do have the option of discretizing the date and chunking documents into buckets of day or week and within each bucket sort by static quality. This might be a great solution. If the user doesn’t expect exact date/time, then they will be more forgiving when the documents don’t appear in exact date order down to the second within the buckets. However, there are still problems – within each bucket, there are fewer documents to draw from. Nevertheless, the search engine will faithfully provide the best documents it has for each bucket. This means that as your bucket size gets smaller, the chances of the bucket getting filled with irrelevant documents become higher. It would be better if we don’t return the bucket at all, but per our Matching and Sorting by Relevance section, scoring is not absolute, so it might still be difficult to decide which buckets we should omit from the search results.

No hard and fast solutions

As you can see in the sections above, I’m doing an excellent job of outlining a huge problem, but I’m not providing any easy solutions. That’s because there aren’t any!

By its very nature, search and recommendation is and forever will be filled with nasty corner cases. Human language is dirty and imprecise, and your users’ information needs will be uncertain and highly varied. However, don’t lose hope. Despite the many corner cases, search technology is still an excellent tool for helping users to satisfy their information needs in the vast majority of use cases.

What’s more, search is getting better. Learning to Rank is a machine learning technique for scoring documents that can automatically find the best balance between features like text relevance, static quality, and innumerable other things. Similarly, there has been lots of conversation in the search community about embedding vectors into search so that traditional inverted-index search can be used in conjunction with recent developments with machine learning (check it out!).

Finally, I would expect the user experience to continue to develop and improve. The dominant search experience for the past 15 years has been a text box at the top of the screen. However, we see this giving way to more conversational search experiences like when you ask Siri to look up a phone number or when you ask Alexa to play a particular song. The visual experiences are changing too. Take a look at Google’s image search. There the left-nav faceted search has been replaced with a very intuitive tag-based slice-and-dice experience that allows you to quickly narrow down the small set of results that fit your information needs. I expect we will continue to get better and better experiences.

You can learn more about building a relevant search by checking out my previous post on understanding the ideas of precision and recall as they relate to search.

Have you run into the fundamental problem of search in your own search applications? What have you done to overcome it? I’d like to hear from you! Ping me on Twitter @JnBrymn or add a response at the bottom of the page. If you’d like, we can jump onto a hangout and share some war stories.

Photo by Andrew Neel on Unsplash

How to Make Swift Product Changes Using a Design System

Redesigning an entire site is a daunting challenge for a frontend team. Developers approach extensive visual changes with caution as they can be challenging. You might have to go through hundreds of stylesheets updating everything from hex values to custom spacing. Did you use the same name for colors on all your files? No typos? Do your colors have accessible contrasts? What a nightmare!

At Eventbrite, our design system helps our developers make those sweeping changes all while saving time and money. Keep reading to see how a design system can help your team with consistency, accessibility, and lightning-fast redesigns.

The Key to Consistency

A design system is a library of components that developers across teams can use as building blocks for their projects. A shared library allows everyone to use components, or reusable chunks of styling and code, that look and work the same way. You don’t want ten similar but different copies of the same thing, do you? Take custom file uploader components, for example. If each team builds their custom version of the component, not only does it create a confusing user experience, but it also means that developers across teams have to maintain and test all of them. No, thank you!

As part of the Frontend Platform team here at Eventbrite, my team and I maintain the Eventbrite Design System (EDS). Because we wrote EDS in React, some of our apps use EDS while legacy apps that use other JS frameworks do not. As we move more of our products move over to React, adoption of our design system is increasing. Our user experiences across all of our platforms look and feel more cohesive than ever before. Every EDS file uploader looks and behaves the same way (with minor variations).

Accessibility for All

When everyone uses the same component, you can build accessibility features in one place, and others can inherit it for free. Furthermore, you or a dedicated team can now thoroughly test each component to ensure they work for users of all abilities and needs. The result? People that navigate your site using screen readers or keystrokes can now use your product!

We love taking advantage of this benefit here at Eventbrite. We ensure the colors in our design system components have the right contrast ratios, which means that all Eventbrite pages are usable by people with colorblindness. Our color documentation page uses CromaJS to help calculate the rations for our text and color combinations. We also use WCAG AA as our contrast standard.

A sample of one of our colors on the Eventbrite Design System colors documentation page. It includes the color name, hex, RGB, and Luma values along with the WCAG score.

We also strive for our components and our pages to work well with keyboards and screen readers. EDS has a Keyboard higher-order component (HOC) where we use react-hotkeys to help us set up our React pages for optimal keyboard accessibility. Eventbrite works towards having all our components be accessible to all. Thanks to our design system, when Frontend Platform doubles down on accessibility, all teams that use EDS inherit the accessibility improvements by keeping up with our latest version.

Quick Turn-Arounds and Fast Redesign

Now, back to the redesign scenario. If you’ve defined all your colors and variables in one place, your team no longer has to hunt down definitions for each component. One developer can change a hex value (say, from #DB5A2C to #F05537), and every app that uses your design system inherits all changes right away.

In spite of all our planning and prep work, every once in a while our team needs to set a tight deadline. In our latest redesign, we made sweeping typography and color changes. While it seemed like a massive task, EDS enabled us to make many of these changes very quickly. We spent most of our time and energy making these changes to our products that don’t yet use EDS and thus require specific updates and quality assurance.  Check out the results of the transformation below!

Search Results Page Before the Rebrand

Eventbrite Search Results Page Before Redesign
Search Results Page After the Rebrand

Eventbrite Search Results Page After Redesign

Home Page Before Rebrand

Eventbrite Home Page Before Redesign

Home Page After the Rebrand

Eventbrite Home Page After Redesign

While adopting, implementing, and maintaining a new design system took serious work, the benefits have been well worth it. A design system might save your team a lot of time and work, too. However, they are not a magic bullet, and it takes time to get it right. Don’t despair if it doesn’t look as fleshed out as some of the more popular and well-staffed design systems, like Google’s Material UI or Airbnb’s Design Language System. Start saving time and money by having a shared library to increase consistency, increase the accessibility of your product, and make broad changes safe. Create a design system as unique as your product and start reaping the benefits.

What about you? Is your team using a design system? Is it a custom built one? Drop us some lines in the comments below or ping me directly on Twitter @mbeguiluz.

BriteBytes: Diego “Kartones” Muñoz

An Eventbrite original series, BriteBytes features interviews with Eventbrite’s growing global engineering team, shining a light on the individuals whose jobs are to build the technology that powers live experience.

One of my favorite things about Eventbrite is getting to work with engineers from all over the world. In September, I had the pleasure of sitting down with Diego “Kartones” Muñoz, a Principal Engineer visiting Eventbrite’s headquarters HQ in San Francisco from our Spain office. He joined Eventbrite through our Ticketea acquisition in May and works out of Madrid with the Ticketing and Registration Business Unit (TRBU) Mapache team. In this interview, he tells us about his path, what it’s like onboarding onto a larger company, and things he likes most working at Eventbrite.

Tamara Chu: How did you come to work for Ticketea/Eventbrite? What was your path as a software engineer?

Diego “Kartones” Muñoz: I started early in development and computers, so before entering university I already knew a bit and wasn’t sure if I wanted to study it or not. I started studying, then I quit after a few years because I thought it was boring [laughs]. I started working, and I felt I was learning way more by working. Since then I’ve switched a lot: I started consulting with .NET, then switched to PHP and more open-source stacks, then I switched to Ruby, and since 2015, Python, which I’m in love with.

In 2009, I switched from consulting for other companies to product development, and since then I have been in multiple different areas: social networks, web gaming portals, mapping tools, video generation tools, and now ticketing.

T: How long had you been at Ticketea before Eventbrite?

D: I joined March 2017, so one year. In total it’s now been one year and a half between Ticketea and Eventbrite.

T: And did you like the culture of Ticketea compared to the other companies you’ve worked at?

D: Yes, that was probably the deciding factor. A friendlier company, not willing to jump on the startup unicorn hype but preferring to focus on a single product; not so worried about growing a lot, but keeping the product stable when adding new features. Also, while Ticketea had investing, it was a small amount, and it was profitable, so it was nice that we weren’t in such a hurry to always be generating lots of new users or lots of new revenue, just growing steady but at a slower pace than other startups.

It’s not that that’s bad in itself, but other places I’ve been were just growing, growing, growing, and they didn’t care about quality as much.

T: Mm, like growth for growth’s sake, no matter what happens to the team or what kind of culture you’re building.

D: Yes, exactly, or when things are failing often because the platform is not stable enough.

T: Has the transition to Eventbrite felt natural? Or what was that shift like?

D: I think for us it has been quite natural, also because our stack at Ticketea was more or less similar; we already used most of the tech stack. [The shift] has been learning a new platform, adjusting to mostly everything in English, and the time difference.

T: Yeah, [the time difference] is a big one the teams are still figuring out. Was there anything about Eventbrite that surprised you when you joined?

D: The size and the scale of some things, like the size of some big events that [Eventbrite] has might be more than the total of what Ticketea sells in one year. And some parts of the technology, you can actually look at it and see that it has years of experience put into there, and [years of] thought evolving those parts. That’s something I appreciate a lot, spending time improving and making things better.

T: Was there something that excited you, like “oh cool, this is something new that I can look into?” Something specific?

D: Yes, for example, the way the APIs work — the internals of how to build and expand them and how they communicate between themselves — it was a problem that I’ve seen in the past but never solved as cleanly as here. I’m not an expert on API development, but here I think we have a good and elegant solution.

T: How were you doing it at Ticketea versus here?

D: For example regarding API design, ours were less advanced, more built in a classical way of “load data, fetch all related entities and return everything.” It was more manual work, without the EB API magic. We also didn’t have the scale as Eventbrite, so usually performance wasn’t a problem; things would go slower, but it would still work. At Ticketea also we were just two technical teams, so also there’s been a big jump to now being part of a company with hundreds of engineers.

T: Was there anything from Ticketea that you wish had come over to Eventbrite?

D: The automated deployment, the quicker release cycles. As we didn’t have Ops, we were all tiny part DevOps, mostly developers. We handled our own infrastructure. That’s also why we were switching from AWS to GCP [Google Cloud Platform] because it removes an additional layer of complexity. So we can self-deploy without systems or release engineers. We had automatic deploys, canary releases, simple traffic splitting, automatic with a slider with one button. Those things, here with so many people and so many services, it’s not as quick.

T: What has been your favorite thing about working at Eventbrite?

D: Probably being able to work on such a big project. Because we’re thinking, you build something, it’s not something that three or four people are going to use, but it’s a thing that millions of people are going to use. But still, I don’t know what else, because it has just been a few months [laughs].

T: [laughs] I’ll ask you again in another 6 months.

D: Yeah, let’s do that!

T: How about your least favorite thing?

D: Adapting, maybe, to the way of releasing things. We have lots of services with complex interactions, so you have to be careful and take additional steps to deploy services. Every change takes extra effort to update and release, etcetera, which I wasn’t used to due to our smaller scale and mostly automated platform.

T: Do you see opportunities to change that?

D: I think yes. I don’t know what the future is for our team, but yes, of course, I feel there are opportunities to improve the way things are done. There’s PySOA (Eventbrite’s Python library for writing microservices and their clients), there are tools in place to migrate services, and probably going to be more alignment between product and tech — is this important, or are there more pressing issues, or can we take advantage of doing something with the service to also separate it?

T: What are you most excited about?

D: All the things that I can learn from the platform. I am just grasping the tip of the iceberg, how everything works: the backend parts, learning React, how the tools we use work (internally), DevOps, the infrastructure that we have, the general learning opportunity of the architecture, and the platform.

Diego has been an active part of Spain’s tech scene for many years, and it’s fantastic having him on the team. Learn more about him at A big thank you to Diego for sharing his background and experience. We’re looking forward to hearing more from him and the rest of the team in the future, so stay tuned for more BriteBytes!

Rethinking quality and the engineers who protect it

Testing software is an important responsibility, but testing is not a synonym for quality. At Eventbrite, we are trying to dig deeper into quality and what it means to be a QA Engineer. This article is not just for QA engineers, it is for anyone who wants to better understand how to deliver higher quality products and better utilize QA resources. If you don’t have QA resources, by the end of this article you will have a better idea of what to ask for when you look to add a QA Engineer to your team.

Rethinking the role

When I sat down to write an updated job description for our QA Engineering position, I started my research by looking at job listings from similar companies. Most of the listings agreed on one thing: QA Engineers test. The specifics vary, but the posting would always include a range of automated and manual testing tasks.

While these testing tasks are worth doing, testing software doesn’t ensure that  the output is a high quality product. In practice, effective QA extends well beyond testing. QA Engineers should ensure teams develop products that work and address a targeted customer need.

The iron triangle

Being a strong advocate for quality requires understanding what could cause quality to suffer. I’d like to start this post by introducing the concept of “The Iron Triangle” The triangle is a visualization sometimes used to describe the constraints of a project, but it also works as a model for the challenges of maintaining quality.

The idea here is that we constrain the quality of a project by its scope, deadline, and budget (among other factors). Changes to one of these constraints then require balancing adjustments to the others, or quality suffers.

External quality

The team can’t control all of these constraints, but it is critical that they monitor them. These constraints directly impact the quality of work. This sort of quality is external because it is quality as understood by the customer.

Some scenarios

  • A project has a broad scope. The timeline for the project is likely full of feature work, with limited time left for testing tasks. Intervention can mean working to carve out time to write and perform tests, advocating for a reduction in scope, or developing a testing approach that is lean without sacrificing too much coverage.
  • A project has a tight budget. This type of project is likely to have even less time to spend on quality. In these cases, my preference is to establish clear goals and expectations with stakeholders for quality in the planning step. This process enables the team to pack their limited QA time with more precise and targeted testing tasks without misrepresenting how hardened our code may be when we finish the work.
  • A project has an open timeline. This is less common but has its own challenges to quality. When we give plenty of time to projects, they naturally move more slowly. In these situations, it is essential to test along the way, because the closing days of this project can be hectic. I like to limit final testing before release as much as possible with incremental tasks and plenty of automated testing. That way, I can protect the development team from last-minute changes, complexity, and most major bugs.

External quality is linked directly to the success of the business and is everyone’s responsibility. All arms of the business are responsible for maintaining external quality and delivering functional products.

Beyond bugs

I loosely consider an issue a bug any time the software produces an incorrect or unexpected result or behaves in unintended ways. Bugs are going to happen, and minimizing their occurrence is why we test software. However, external quality can only cover so far as we understand how the product will be used. You cannot write a test to cover a use case you don’t understand or know about

If something works as expected but fails to meet the user’s need, this is still an issue of quality. However, this is not a bug. The QA team should bring knowledge of the product and the user to the entire development process. If QA is involved in the planning phase, and the testing phase of development, they can help with more than just finding bugs. They can help ensure developers more thoroughly understand how users employ the products they are building.

Internal quality

That said, there is also an internal, procedural component to quality. Writing code and building products in a way that minimizes technical debt and mitigates risk maintains internal quality. Being good at managing external quality does not make an organization good at managing internal quality.

A new scenario

  • The development team is wrapping up a project and is ready to execute their test plan. Through testing, they uncover some bugs and edge cases that they didn’t think of when writing requirements for the project. To fix these issues, they need to add cyclomatic complexity. This could reduce internal quality and has downstream effects on external quality too. This issue could have been curtailed by involving QA in the writing of product requirements, or by being more deliberate when considering edge cases and architecting the feature.

Balancing external and internal quality

Good external quality is not an indication of good internal quality. Since QA Engineers are driving external quality, they need to be cognizant of increased complexity as an output of testing. Testing uncovers more than bugs, it also uncovers where the product we are building may be failing to meet user needs. Addressing these gaps is critical to quality, but can have a significant impact on timeline, budget, and scope. Our compromises are likely to produce technical debt

Technical debt

Technical debt should be a conscious compromise. The development team can give up some internal quality to make the project work within other constraints. Future work to pay off that technical debt often competes for the same development time as work done to fix a bug, and both issues concern overall quality. This can be a confusing number of plates to keep spinning at once. We should neglect neither type of quality work for the other, and understanding their relation to one another is crucial to preserving high overall quality.

One final scenario

  • The business asks for a feature with very narrow scope, a small budget, and a tight deadline. The feature will require new development work on an old, neglected part of the codebase. The development team is worried about losing time to cleaning up technical debt around their integration points and bringing the old code in line with new standards and practices. Testing time for the new feature work is already tight, and the business wants the development team to prioritize keeping the existing feature set healthy. The team needs to make certain compromises to meet their target release date. One of those compromises is balancing investment in internal quality against the external quality of this new feature and the old code.

Protecting quality

While it is critical to be understanding and compromise during development, QA Engineers should remain biased toward quality. The organization has managers charged with protecting budget, scope, and deadlines – but quality should have an advocate too. QA Engineers should spend time encouraging and coaching development teams on bugs and testing tasks, but the real goal should be to encourage those teams to take ownership of quality.

When the user-need and gravity of testing is well-communicated and well-understood by developers, they write higher quality code. Developers that understand their users write better tests that leverage user stories, rather than the developer’s expectation for what their code does. Beyond testing functionality, they are making sure that what they have developed aligns with how the product is addressing targeted need.

Engaged developers make the best testers

To be clear, I am advocating that developers do their testing and own their quality. Outsourcing your testing to automation engineers or manual testers is an option, but comes with drawbacks. Developers bring vital skills for driving quality into the product at speed. Engineers are also uniquely positioned to solve problems with their code, and developers that write their tests are more vested in fixing them when they fail.

The QA team can and should assist with this process. They can help developers deliver higher quality products by making sure the project is testable upfront, and making sure the approach to testing is thorough and considerate of other constraints to development. Beyond just saying that “quality should be high”, the team should set expectations for quality within the context of other constraints. These expectations serve two purposes. Foremost, they helps with estimation. If you fail to consider QA tasks during estimation, then you have not made time for quality. Secondly, it binds quality to the development process, fostering ownership within the team. Teams that take ownership of their work are more invested in delivering higher quality products.

The new job description

QA Engineers should protect overall quality. They should work with teams to find the right balance of testing for each unique project. To do this, a good QA Engineer understands quality in the context of other constraints to development and is willing to compromise, but will never allow the business to concede quality. When a business delivers low-quality products, it fails.

SQA Quality Engineer

New Job Listing for QA Engineer

What strategies do your teams use to assure quality? How do you leverage your QA team beyond testing? Tell us about it in the comments and drop me a line on Twitter @aqualityhuman.

The Quest for React Micro-Apps: Single App Mode

Eventbrite’s React applications are a single React app with many entry points. To improve the development experience for both backend and frontend engineers, we implemented a single application mode (codenamed SAM) in our local environments. Whenever the React Docker container boots, it downloads and statically serves a set of pre-built assets for all of the React applications so that Webpack compilation never has to run.

Using a settings file, developers can indicate that they would like to run only their app in an active development mode. Having this feature was another significant milestone towards the quest for micro-apps. Backend engineers no longer have to wait for Webpack to set up to compile and recompile files that they will never change, and frontend developers only need to run Webpack for their app.

The post you are reading is the second in a series entitled The Quest for Micro-Apps detailing how our Frontend Platform team is decoupling our React apps from themselves and our Django monolith application. We are going to do it by creating Micro-Apps so that we can develop and deploy independently. If you haven’t already, check out the Introduction that provided background and overall goals for the project.

A little background

Our React apps are universal apps: they render both client-side in the browser and server-side in Node. Also, as mentioned in the introduction, we have just one single React application with an entry point for every app, which is how we get the different bundles to use for the different apps.

We use Docker for our development environment, which runs many, many containers to spin up a local version of all of One of these containers is our React container that contains all of the React apps. When the container starts, it spawns two Webpack processes that watch for source code changes. The server-side render requests consume the Node bundles that the first task writes to disk. The second process is a webpack-dev-server process, which creates in-memory bundles and reloads the page once new changes are compiled.

The growth problem

This setup worked fine when we initially created this infrastructure over a year ago, and we had less than a dozen apps; the processes ran quickly and development felt very responsive. However, a year later, the number of apps had nearly tripled, and the development environment was starting to feel sluggish, not only for the frontend developers who are living in React-land but also for the backend developers who never touch our React stack.

Our backend engineers developing APIs, working on the monolith, or merely browsing the site locally were spawning those same two Webpack watchers even though they weren’t making any JavaScript changes. Our backend devs were also waiting for the Webpack processes to perform their initial compilation at container start, which wasted a good amount of time. The container was also eating up a lot of memory watching for file changes that would never happen. Backend devs didn’t need Webpack running at all, just for the local site to work.

It was not just the backend devs who were hurting. Because all of the React apps were just a single app with many entry points, we were recompiling the entire app every time a change happened. When a dev made a change to their app, Webpack had to follow all of the other 29 entry points to see if their Node and webpack-dev-server bundles needed to be recreated as well. Why should they have to wait when they only cared about changes to their app? Webpack is smart about knowing what has changed, but it was still doing a whole lot of unnecessary work. Furthermore, at the container start, we were still waiting for the initial Webpack compilation to build all of the other apps, in addition to the one we were working on.

Static apps to the rescue

Our proposed solution was to enable a “static mode” in our development environment. By default, everyone would load the same bundled assets that are used in our continuous integration (CI) server. In this case, we wouldn’t need webpack-dev-server running; we could use a simple static Express server for serving assets. This new approach would greatly benefit our backend engineers who weren’t writing React code.

A developer would have to opt-in to run their app(s) in “dynamic mode.” However, the Webpack processes would only watch specific app(s), significantly reducing the amount of work they would need to do. This approach would greatly benefit our frontend engineers who were working on only an app or two at a time.

Single Application Mode (codenamed SAM) also fit into our long-term strategy of micro-apps. We still want developers to be able to browse the entire site in their local development environment even when all of the React applications are independently developed and deployable. Enabling this mode means that most or all of the local site has to be able to run in “static mode,” similar to a quality assurance (QA) environment. So this milestone not only allows us to break up this mega project but also increases developer productivity while we journey towards the end goal.

How we made it happen

As mentioned in the introduction, this entire endeavor is about replacing the existing infrastructure while it’s still running. Our goal is zero downtime due to bugs or rollbacks. This means that we have to move in smaller phases than if we were just building it greenfield. Phase 1 of this project introduced the concept of “static mode,” but it was disabled by default and it was all-or-nothing; you couldn’t single out specific apps. Once we tested and verified everything was working, we enabled “static mode” by default in Phase 2. After that was successful in the wild, we added “single-application mode” (SAM) in Phase 3.

Phase 0: CI setup

Before anything began, we needed to augment our current CI setup in Jenkins. To run in “static mode,” we decided to use the production assets built for our CI server in our development environment. This way, developers could easily replicate the information in our QA environment within their development environments.

When the code is merged to master, a Jenkins job builds the production assets and uploads a tarball (a package of files compressed with gzip) to the cloud with the build id in its name. Every hour, the latest tarball is downloaded and unpacked on a specific QA machine to create our CI environment.

That tarball is massive because it includes every bit of CSS and JavaScript for the entire site. It takes many minutes to download and unpack the tarball, so we couldn’t use it to seed our development environment. Instead, we created a new tarball of just our React bundles for quicker downloading and unpacking.

Phase 1: All dynamic by default

Then we began building the actual system. It relies on a git-ignored settings.json file that has a configuration for how the system should work:

    "apps": null,
    "buildIdOverride": "",
    "__lastSuccessfulQABuildTime": "2018-06-22T21:31:49.361Z",
    "__lastSuccessfulQABuildId": "12345-master-cfda2b6"

Every time the react container starts, it reads the settings.json file and the apps property that indicates static versus dynamic mode. If the settings.json file doesn’t exist, it gets auto-created with null as the value for the apps property. One or more app names within the apps array means dynamic mode, while an empty array means static mode, and null means use the default.

If the settings file indicates static mode, we retrieve the latest QA tarball stored in the cloud and unpack it locally where the Webpack compiled bundles would have been. We choose the latest build on QA instead of the HEAD of master so that what’s running locally will match what’s currently running on QA. The __lastSuccessfulQABuildTime and __lastSuccessfulQABuildId properties are logging information written out in static mode to help with later debugging.

Now, instead of running webpack-dev-server, we just run a static Express server to serve all of the static bundle assets. Because our server-side React renderer is already reading bundles written to disk by the second Webpack process, it doesn’t have to change at all because now those bundles just happen to come from the tarball.

Here’s the gist of the Docker start script:

(async () => {
    // create settings.json file w/ default settings if it doesn't exist yet

    // fetch prebuilt bundles from cloud, use `--no-fetch` to bypass
    if (!process.argv.includes('--no-fetch')) {
        try {
            await spawnProcess('yarn fetch:static');
        } catch(e) {

    if (shouldServeDynamic()) {
        // run webpack in normal development mode
        spawnProcess('yarn dev');
    } else {
        // run static server to serve prebuilt bundles
        spawnProcess('yarn serve:static');

A developer can also select a specific tarball with the buildIdOverride property instead of using the most recent QA tarball. This is a rarely used feature, but comes in handy when needing to test out a release candidate (RC) build (or any other build) locally.

The key with this phase was minimal disruption. To start things off, we defaulted to dynamic mode, the existing way things worked. If any app was listed (i.e. apps was non-empty), we would run all the apps in the dynamic mode, using Webpack to compile the changes.

When this released, everything worked the same as before. Most folks didn’t even realize that the settings.json file was being created. We found some key stakeholders to explicitly enable static mode and worked out the kinks for about a week before moving on to Phase 2.

Phase 2: All static by default

After we felt confident that the static mode system worked, we wanted to make static mode the default, the huge win for the backend engineers. First we announced it in our weekly Frontend Guild meeting and asked all the frontend developers to start explicitly listing the names of their app(s) in the apps property within the settings.json file. This way when we flipped the switch from dynamic-by-default to static-by-default, their environment would continue to run in dynamic mode.

    "apps": ["playground"],
    "buildIdOverride": "",
    "__lastSuccessfulQABuildTime": "2018-06-22T21:31:49.361Z",
    "__lastSuccessfulQABuildId": "eventbrite-25763-master_16.04-c1d32bb"

It was at this point that we wished we had a feature flag or rollout system for our development infrastructure, like the feature flag system we have for the site where we can slowly roll out features to end users. It would’ve been nice to be able to turn on static-by-default to a small percentage of devs and slowly ramp up to 100%. That way we could handle bugs before they affected all developers.

Without such a system, we had to make the code change that enabled static mode as the default and just hope that we had adequately tested it! Now any developer who hadn’t specified an app name (or names) in their settings.json would get static mode the next time their React container restarted. We ran into a few edge case problems, but nothing major. After about a week or two, we resolved them all and moved on to Phase 3.

Phase 3: Single-application mode (SAM)

Single-application mode (codenamed SAM) was the actual feature we wanted. Instead of having to choose between all-dynamic or all-static, we started reading the apps property to determine which apps to run in dynamic mode while leaving the rest in static mode.

Before in all-dynamic mode, we determined the entry points by finding all of the subfolders within the src folder that had an index.js entry point. Now with single-application mode, we just read the apps property in settings.json to determine the entry points. All other apps are run in static mode.

 * returns an object with appName as key and appPath as string value to be consumed by webpack entry key
const getEntries = () => {
    const appNames = getSettings().apps || [];
    const appPaths = => path.resolve(__dirname, appName, 'index.js'))
        .filter((filePath) => fs.existsSync(filePath));

    if (_.isEmpty(appPaths)) {
        throw new Error('There are no legitimate apps to compile in your entries file. Please check your settings.json file');

    const entries = appPaths
        .reduce((entryHash, appPath) => {
            const appName = path.basename(path.dirname(appPath));

            return {
                [appName]: appPath,
        }, {});

    return entries;

Before single-application mode, we ran a simple Express server for all-static and webpack-dev-server for all-dynamic. With SAM we have a mixture of both modes. However, we cannot run both servers on a single port. So we decided to only use webpack-dev-server and add middleware that would determine whether or not the incoming request was for an app running in dynamic or static mode. If it’s a static mode request, we just stream the file from the file system; if it’s a dynamic request we route to the appropriate webpack-dev-server using http-proxy-middleware.

const appNames = getSettings().apps || [];

// Object of app names and their corresponding ports to be ran on
const portMap = appNames.reduce((portMap, appName, index) => ({
    [appName]: STARTING_PORT + index,
}), {});

// Object of proxy servers, used to route incoming traffic to the appropriate client dev server
const proxyMap = appNames.reduce((proxyMap, appName) => ({
    [appName]: proxyMiddleware({
        target: `${SERVER_HOST}:${portMap[appName]}`,
}), {});

// call each workspace's <code>yarn start</code> command to kick off their respective webpack processes
appNames.forEach((appName) => {
    spawnProcess(<code>yarn workspace ${appName} start ${portMap[appName]}</code>);

const app = express();

// Setup proxy for every appName in settings. All devMode content requests will be
// forwarded through these proxies to their corresponding webpack-dev-servers
app.use((req, res, next) => {
    const appName = path.parse(req.originalUrl).name.split('.')[0];

    if (proxyMap[appName]) {
        return proxyMap[appName](req, res, {});


// by default serve static bundles
app.use(ASSET_PATH, express.static(BUNDLES_PATH));

// start the static server


Issues are likely to arise with any significant change, and the change for developers to only run their app in dynamic mode was huge. Here are a couple of issues we encountered that you can hopefully avoid.

The Common Chunk

Because all of our different apps were just entry points in one big monolith app, we were able to leverage Webpack’s CommonChunkPlugin to create a shared bundle that contains the common dependencies between all of the apps. That way when our users moved between apps, after visiting the first app, they would only have to download app-specific code. Even though this is a production optimization, we built the common chunk in our development environment with webpack-dev-server as well.

Unfortunately, the common chunk broke when multiple apps were specified. Although it’s called SAM (single-application mode), the system supports specifying multiple applications that developers would like to run in dynamic mode simultaneously. While we tested that multiple apps worked in SAM, we did the majority of our testing with just one application, which is the common use case.

We include this common chunk in the tarball that gets downloaded, unpacked, and read in static mode. However, when running two apps in dynamic mode, the local common chunk would only consist of the commonalities between the two apps, not all 30+. So using the statically built common chunk caused errors in those apps running in dynamic mode.

Our initial fix was to update the webpack-dev-server middleware to also handle requests for the common chunk. However, this swung the pendulum in the opposite direction. It fixed the common chunk problem for multiple dynamic apps, but now all of the static apps were no longer using the statically built common chunk. They were using the locally built dynamic common chunk. So now all the static apps were broken.

In the end, since the common chunk is a production optimization, we elected to get rid of it in dynamic dev mode. So now no matter how many apps a developer specifies in the apps property of the settings.json, they won’t get a common chunk. However, we still need to keep the common chunk for the static mode apps for now, since the QA environment builds the apps where the common chunk still exists.

“Which mode am I in?”

Another issue we ran into wasn’t a bug, but a consequence of introducing static mode: developers didn’t know which mode they were in. Some backend developers weren’t even aware there was a static mode to begin with; they would try to make changes to an app and wonder why their changes weren’t being reflected. The problem was exacerbated when we introduced SAM in Phase 3 because one app would update while another would not. The Frontend Platform team found ourselves troubleshooting a lot of issues that ultimately were rooted in the fact that the engineer didn’t know which mode they were in.

The solution was to add an overlay message to the base HTML template that all the apps shared. It reads the settings.json file and determines which mode the currently displaying app is in, including the app name. If the app is in static mode it mentions how long it has been since its last refresh.

If the app is in the dynamic mode, it says “webpack dev mode.”

It turned out that mentioning the app name was also crucial because if a dev needed to work on a page that wasn’t their own, they wouldn’t always know which app needed updating.

The results are in

Our hypotheses about the benefits of the project panned out. We started hearing fewer and fewer issues from our backend engineers about the React container failing to boot. Less troubleshooting meant more time for development. Unfortunately, we don’t collect any metrics on individual engineers’ development environments so we don’t have any hard numbers on how much faster the container booted before nor the decrease in memory usage.

The biggest win for the frontend engineers was the reduction in Webpack recompile time when making changes to files. Previously Webpack traversed through all of the entry points, and now it only has to look at one (or however many the developer indicates in settings.json). The rebuild time was 2x or 3x faster, and we received lots of positive feedback.

So even though the SAM project was just a milestone in the overall endeavor to enable Micro-Apps, we were able to deliver lots of value to teams in the interim.

Coming up next

Late last year we started hearing some mysterious, but sparse reports from one or two frontend engineers that at some point Webpack would stop rebuilding when they were making changes. Over time as the engineering team added more apps and more Docker containers, the problem grew to affect almost all frontend engineers. It was even happening to us on the Frontend Platform Team.

We suspected it to be a memory issue, but we weren’t sure the source. We crossed our fingers hoping that the SAM project would fix the issue, but we were still able to trigger the problem even when only running a single app. Things were still on fire, and we realized that we couldn’t move forward with the quest for Micro-Apps until we resolved the instability issues. Any new features wouldn’t have the desired impact if the overall system was still unstable.

In the third post in the series, I will cover this topic in detail. In the meantime, have you ever managed a similar system? Did you face similar challenges? Different challenges? Let us know in the comments or ping me directly on Twitter at @benmvp.

The Quest for React Micro-Apps: The Beginning

Eventbrite’s site started as a typical mid-2000s monolith server rendered application. Although we recently moved into a React stack, we have experienced a lack of flexibility, coupling, and scale issues.

The Frontend Platform team wants to give developer teams autonomy, flexibility, and most importantly ownership of their apps so that they can move at the pace they need to provide value to our users. We have a vision: we want to get to a world where each React application can be both developed and deployed individually. In short, we want micro-apps. In this blog post series, we relate our quest for this vision, so keep on reading!

It’s been a long journey

Eventbrite built its website in the mid-2000s before the concept of a JAMstack (sites built solely on JavaScript, APIs, and Markup) was ever a thing. As a result, the site was a typical monolith application where the backend code (Python) rendered the frontend (HTML) to generate a website. In modern web architecture, we now create an entirely separate API/services layer so that there can be other data consumers, such as mobile apps or external developers.

Later on the frontend, we sprinkled in some jQuery for light client-side interactions. Once we needed more sophisticated experiences, we started using Backbone (and then Marionette). Then in early 2016, the Frontend Platform team added a React-based stack, with the hope of deprecating the legacy jQuery and Backbone apps over time.

Eventbrite isn’t one SPA (single-page application), but a collection of many applications. Sometimes an application is as big as a whole section of the site, like Event Creation/Management or Search & Browse, and other times it’s just a single admin page. In all cases, however, they are universal React apps rendered both server- and client-side.

If you’re interested in how we accomplished server-side rendering with our Django backend, take a look at a talk I gave last year on it:

Not always sunny

Although we’re moving more server-side logic into microservices accessible via the Eventbrite APIv3, our React apps are still tied to the core monolith in many unfortunate ways:

React Server-side rendering

We render server-side through our Django monolith (watch the video for more details), so the Django layer makes calls to the microservices directly to retrieve initial data. These calls are mimicked in JavaScript for subsequent client-side data retrieval.

Django HTML templates

The HTML templates used to hydrate the React apps initially are in Django-land, so all the data and environment information (locale and other context) have to come from the monolith.

Same repository

Because of the reasons above, to create a React application, you also need to create some Django scaffolding, including routing. As a result, the React apps live in the same repo as the core monolith so that developers wouldn’t have to try to keep two separate-yet-not-separate repositories in sync.

Shared package.json

Our React apps themselves aren’t truly separate. They are technically multiple entry points within a single React monolith that have a single package.json and shared bundling, transpilation, and linting configurations. If one team wants to change a dependency for their app, they need to ensure it doesn’t break the 29 others.

Cross-app dependencies

Because all of the apps come together under one single app, we can import components and utilities across applications. We’ve tried to actively discourage this, but it still happens. Instead, we’ve advised teams to put shared dependencies in the (unversioned) “common” folder.

Constant vigilance

The Frontend Platform team currently oversees the dependencies that all the apps use. We need to ensure development teams don’t accidentally back us into a corner with a library choice that prevents us from moving the platform forward in the future. We also need to make sure that those apps not actively being developed do not break with dependency changes.

Unscalable architecture

If the number of our development teams doubled, everything would probably grind to a halt. Eventbrite already has development teams in three continents across four time zones, so the status quo won’t scale.

We have a vision

We need to give teams autonomy, flexibility, and most importantly ownership of their apps so that they can move at the pace they need to provide value to our users.

We have a vision: we want to get to a world where each React application can be both developed and deployed individually; we want micro-apps. For development, devs wouldn’t need the rest of the site running. They could just build their app on their local machine talking to APIs running on our QA environment. Moreover, for deployment, the entire site wouldn’t need to be deployed to deliver new code to our users for a specific app. However, while the apps are independent, they must still feel cohesive and consistent with the rest of for our end users.

Micro-apps aren’t a novel idea in the industry, but we believe that it will be immensely transformational for us.

Our quest

The thing is, the Frontend Platform team can’t just disappear for 6+ months and come back with a shiny new environment. It is too risky. It’s uncertain because the project is so massive. Moreover, it’s dangerous because it’s all or nothing. If at five months the company’s priorities change and we need to work on something more important, we would have five months of sunk cost.

So the plan is to rebuild the entire plane while it’s cruising at 36,000 feet. We’ll work on this project iteratively, breaking it down into smaller goals so that we can provide value frequently. It’d be like flying from SFO to JFK and midway through getting more legroom, free Wi-Fi, or lie-flat seats. We never want to be too far from a place where we can pause the project to work on something of greater importance. If all you got during the flight was the legroom and Wi-Fi, that would be better than having to wait for another flight to get all three.

You may have noticed that I haven’t been speaking in the past tense but in the present. That’s because we’re not done! We want to share our learnings as we go; not just the technology, but also the logistics and processes behind it. We want to share what worked, what didn’t, and what challenges we faced in hopes that you will be able to learn from what we’ve accomplished in real time.

We’re applying the same iterative approach to this series, so I’m not quite sure how many posts there will be. The team has a rough breakdown of the milestones that we want to hit and the value they provide. However, there may not be a one-to-one mapping between milestones and articles.

In any event, let’s kick things off with Part 1: Single App Mode.

Simple and Easy Mentorship with a Mentoring Agreement

Mentoring is hard. Mentors and mentees usually have many things on their respective tables between work, personal projects, and their training paths. Learning opportunities are infinite, but the time available is not. How can we foster productive mentoring relationships without consuming our time communicating and aligning our expectations?

Read on to learn how a mentoring agreement can help you streamline the mentor-mentee relationship, making communications more efficient, and setting the – sometimes hidden – expectations on both sides of the deal.

My struggles navigating the mentorship program

At Eventbrite, we run an engineer mentorship program. During six months, developers and leaders both mentor and receive mentorship from their peers. A committee matches participants depending on the skills they want to learn or teach.

The program has happened a couple of times already, and I have always had hardworking mentees and great mentors. However, during the initial cycle, I struggled with several aspects of the relationship. The first issue was accountability and commitment: How could I motivate my mentees to get things done and make the most of our time? Also, how do I continue to motivate without coming off as pushy or too demanding? Other challenges I faced were inefficient communications or lack of clarity in terms of goals and expectations. As a mentee myself, I assumed my mentors might be experiencing similar challenges.

With these issues in mind and craving to improve, I did some research and looked for solutions. Inspired by 6 Things Every Mentor Should Do and Kim Clayton’s talk Overcoming the Challenges of Mentoring, I arrived at a process that includes a mentoring kickoff meeting, where mentor and mentee discuss a mentoring agreement.

The mentoring kickoff meeting

The mentoring kickoff meeting is a quick gathering where mentor and mentee set goals and talk about how they will measure their achievement. In that meeting, you could also:

  • Set hourly commitments and cadence of meetings and communications.
  • Draft a plan of action for the whole mentorship period.
  • Arrange a review meeting later on, where you and your mentor/mentee can sit down to evaluate the relationship.

However, the most critical part of the kickoff meeting is to read, understand and clarify the points of the mentoring agreement.

What is a mentoring agreement?

A mentoring agreement is a reference document where mentor and mentee agree what are their commitments during the period they work together.

A mentoring agreement can enrich the mentor-mentee relationship with the following qualities:

  • Clear expectations. The agreement highlights what mentor and mentee are going to do, establishing a two-way relationship. The shared expectations also make accountability an official part of the mentorship experience and also help with identifying areas where either mentor or mentee need extra support.
  • Honest communication. The agreement specifies how communication should happen between the two participants, establishing the channels you are going to use and striving for open and transparent communication.
  • Goals and deadline setting. Discussing what the mentee will do and agreeing to a timeline is an essential component of this document, especially in terms of keeping both parties on track and the overall experience productive. You need to know what success looks like to achieve it.

I like to keep the mentoring agreement short, with five to eight bullet points per role. Some points are intentionally vague, leaving room for interpretation and ongoing discussion.

My agreement

Here is the mentoring agreement that I propose to my mentors and mentees for a healthy and productive relationship:

A Mentor

  • Is there to offer support as a guide
  • Will push the mentee to produce their best work
  • Acknowledges the work put forward by the mentee
  • Prepares the mentee to become a mentor

A Mentee

  • Must finish homework on time and with a quality
  • Will graduate after <agreed period>
  • Should let the mentor know if anything is not clear
  • Sets the meeting agenda and shares it with enough time for the mentor to prepare
  • Suggests activities and exercises to do together
  • Welcomes constructive criticism
  • Should keep the relationship going

Both Mentor and Mentee

  • Should be responsive and communicative
  • Should get to know each other

The value of a process

Subscribing to a mentoring agreement sets the expectations of the mentor-mentee relationship, streamlines communication and highlights the goals and deadlines of the interaction.

Although you could say this is all common sense, there is value in making the shared terms explicit. It is more efficient, as you compress several conversations into one. Moreover, you demonstrate the value you bring to the mentorship experience by running it like a pro.

In my first try, this agreement has worked well: it reduced communication overhead, and my relationships have been more productive. I will admit that from time to time I have let a deadline slide for fear of affecting the relationship. I know! I should stick to the agreement, but I guess that’s material for another blog post.

Would you add anything else to this agreement? Is there something you think is helpful to mention? Drop me some lines below or ping me on Twitter @golodhros.

Photo by Mimi Thian on Unsplash