How Your Company Can Support Junior Engineers

My first day as a junior software engineer was one of the most mentally draining days of my life. The next three months on the job weren’t much better. I didn’t leave my house except to go to work or attend a work event. I remember thinking, What did I sign up for?

Read on to learn what your company needs to provide to avoid overwhelming situations for junior engineers and to make it easy for them to be a contributing member of your organization.

Being a junior engineer

I had been coding since I was twelve, but coding was fun. Working as a software engineer was something else. I felt like everyone else in the room was reading Shakespeare and I was trying to get through a copy of Goodnight Moon.

I was the first junior engineer that my company hired. Ticketfly – now part of Eventbrite – had a fantastic attitude about onboarding a junior hire, but it was still tough. My mentor built out an incredible Ember.js app where I had to solve small (I sure didn’t think they were small at the time) problems in JavaScript to make the tests pass. After that, I worked on a small app for fun applying the same framework the company used.

Ticketfly made a meaningful investment in my career, and also took a significant risk in hiring a new junior engineer. However, it paid off for them – and for me. I stayed with Ticketfly through two acquisitions and was promoted to senior software engineer. Also, I’m now a senior software engineer at Eventbrite.

With the lack of senior engineers available in the hiring pool, many companies are now hiring junior engineers – often hiring them right out of coding bootcamps – or creating apprenticeship programs within the company. This is a smart strategy to add talent to an engineering team, but while hiring an experienced or senior engineer adds a contributing member to your team, hiring a junior engineer is a long-term investment.

If companies rush to hire junior engineers before determining the company has the support system, resources, senior team members willing to mentor new engineers, and a realistic timeline for when a junior engineer will add value to the team – everyone suffers.

In the next sections I’ll describe some of the elements necessary for a robust junior engineering training program at your company:


At Eventbrite, we have outstanding support systems in place for our junior engineers. It starts at an orientation where you’re assigned a ‘buddy’ who helps you with anything from finding your way around the office to helping you set up your development environment. Eventbrite runs a mentorship program where we match mentees to a mentor based on what they want to learn. We also host weekly Code Labs where senior engineers teach a new topic each week. The Code Lab is a comfortable environment for the junior to ask questions without feeling incompetent.

Take away

Do you have a knowledgeable engineer willing to take the time to mentor and pair with the new hire? Onboarding and teaching a junior engineer is a considerable responsibility. It could take months before they’re entirely independent. It’s beneficial to determine whether you can afford to hire someone who might take months to ramp up.

Managing expectations

One of the incredible things about my current company is its focus on jumpstarting the careers of junior engineers. Eventbrite has hired more bootcamp graduates from Hackbright (an all women coding bootcamp) than any other company out there.

Some of our junior hires are the best engineers I’ve ever worked with. While each new junior hire took months to bring up to speed and train – the investment we’ve made in hiring more junior developers has paid off for Eventbrite. Most of the hires are still at the company years later, and are exceptional contributors to the codebase.

Take away

Before hiring a new junior engineer, think about your timing. Would you have time to mentor this engineer?

Pair programming

Having a culture of pair programming is a great benefit when hiring a junior engineer. One of the great things for me when I started out my career was that my mentor would pair with me when I got stuck on an issue, but he also encouraged me to go figure things out on my own. It was a great mix because I had support so I wouldn’t feel stressed, but I also was given the independence to learn and grow.

We practice driving and navigating pairing at Eventbrite. It helps to keep both members of a pair actively engaged in solving the problem. Based on my experience, it’s useful to hire several junior engineers at a time so they can pair together and support each other. It’s also helpful to the company because you can onboard and train them at the same time.

Take away

Try assigning an easy task to the junior engineer to take a stab at it first. If they run into problems, try pairing on the problem. Eventbrite buddies have regular check-ins with new hires to make sure things are running smoothly.


Eventbrite hosts an incredible training session on React with our principal Front-End engineer, Ben Ilegbodu. The session walks you through building out an application in React and provides solutions in separate branches if you get stuck so that anyone can keep up with the pace. It’s incredibly welcoming and inclusive when you’re just starting out.

Most companies tailor their onboarding programs toward engineers with years of experience. This practice can be frustrating to a junior engineer. One of the great things that Eventbrite does to support junior engineers is to provide a buddy just for setting up your dev environment.

Another side of it could be online training. Eventbrite provides Udemy accounts for all employees. I’m continually taking online courses, and it’s fantastic to work at a company that supports continuing your education.

Take away

Creating a small repository with tests that your junior engineer needs to get passing can get someone onboarded quickly. Consider finding a training program if your company doesn’t have the resources to create one internally. Find an interesting conference that offers a workshop.

Wrapping up

My first company took a chance by hiring a junior engineer and I’ll always appreciate they made that commitment. While I was their first junior engineer, they had a strategy, realistic expectations, provided me with a strong support system, and invested in my learning. That’s what it takes to support a junior engineer that has a shot at becoming an asset to your team.

Do you remember your first day as an engineer? Share your funny junior engineering mistake in the comments below or reach out to me on Twitter @randallkanna.

Creating Flexible and Reusable React File Uploaders

The Event Creation team at Eventbrite needed a React based image uploader that would provide flexibility while presenting a straightforward user interface. The image uploader components ought to work in a variety of scenarios as reusable parts that could be composed differently as needs arose. Read on to see how we solved this problem.

What’s a File Uploader?

In the past, if you wanted to get a file from your web users, you had to use a “file” input type. This approach was limited in many ways, most prominently in that it’s an input: data is only transmitted when you submit the form, so users don’t have an opportunity to see feedback before or during upload.

With that in mind, the React uploaders we will be talking about aren’t form inputs; they’re “immediate transport tools.” The user chooses a file, the uploader transports it to a remote server, then receives a response with some unique identifier. That identifier is then immediately associated with a database record or put into a hidden form field.

This new strategy provides tremendous flexibility over traditional upload processes. For example, decoupling file transportation from form submissions enable us to upload directly to third-party storage (like Amazon S3) without sending the files through our servers.

The tradeoff for this flexibility is complexity; drag-and-drop file uploaders are complex beasts. We also needed our React uploader to be straightforward and usable. Finding a path to provide both flexibility and ease-of-use was no easy task.

Identifying Responsibilities

Establishing the responsibilities of an uploader seems easy… it uploads, right? Well sure, but there are a lot of other things involved to make that happen:

  • It must have a drop zone that changes based on user interaction. If the user drags a file over it, it should indicate this change in state.
  • What if our users can’t drag files? Maybe they have accessibility considerations or maybe they’re trying to upload from their phone. Either way, our uploader must show a file chooser when the user clicks or taps.
  • It must validate the chosen file to ensure it’s an acceptable type and size.
  • Once a file is picked, it should show a preview of that file while it uploads.
  • It should give meaningful feedback to the user as it’s uploading, like a progress bar or a loading graphic that communicates that something is happening.
  • Also, what if it fails? It must show a meaningful error to the user so they know to try again (or give up).
  • Oh, and it actually has to upload the file.

These responsibilities are just a short list, but you get the idea, they can get complicated very quickly. Moreover, while uploading images is our primary use case, there could be a variety of needs for file uploading. If you’ve gone to all the trouble to figure out the drag / drop / highlight / validate / transport / success / failure behavior, why write it all again when you suddenly need to upload CSVs for that one report?

So, how can we structure our React Image Uploader to get maximum flexibility and reusability?

Separation of Concerns

In the following diagram you can see an overview of our intended approach. Don’t worry if it seems complicated -we’ll dig into each of these components below to see details about their purpose and role.

React Architecture illustration showing overview of several components that will be enumerated below.

Our React based component library shouldn’t have to know how our APIs work. Keeping this logic separate has the added benefit of reusability; different products aren’t locked into a single API or even a single style of API. Instead, they can reuse as much or as little as they need.

Even within presentational components, there’s opportunity to separate function from presentation. So we took our list of responsibilities and created a stack of components, ranging from most general at the bottom to most specific at the top.

Foundational Components

Recap of the architecture illustration from above with "Foundational Components" highlighted.


This component is the heart of the uploader UI, where action begins. It handles drag/drop events as well as click-to-browse. It has no state itself, only knowing how to normalize and react (see what I did there?) to certain user actions. It accepts callbacks as props so it can tell its implementer when things happen.

Illustration showing a clear pane representing UploaderDropzone over a simple layout in React

It listens for files to be dragged over it, then invokes a callback. Similarly when a file is chosen, either through drag/drop or click-to-browse, it invokes another callback with the JS file object.

It has one of those inputs of type “file” that I mentioned earlier hidden inside for users who can’t (or prefer not to) drag files. This functionality is important, and by abstracting it here, components that use the dropzone don’t have to think about how the file was chosen.

The following is an example of UploaderDropzone using React:

    Drag a file here!
    <Icon type="upload" />

UploaderDropzone has very little opinion about how it looks, and so has only minimal styling. For example, some browsers treat drag events differently when they occur on deep descendants of the target node. To address this problem the dropzone uses a single transparent div to cover all its descendants. This provides the needed experience for users that drag/drop, but also maintains accessibility for screen readers and other assistive technologies.


The UploaderLayoutManager component handles most state transitions and knows which layout should be displayed for each step of the process, while accepting other React Components as props for each step.
Illustration showing a block representing the UploaderLayoutManager and several smaller blocks representing layouts being flipped/shuffled

This allows implementers to think about each step as a separate visual idea without the concern of how and when each transition happens. Supporters of this React component only have to think about which layout should be visible at a given time based on state, not how files are populated or how the layout should look.

Here is a list of steps that can be provided to the LayoutManager as props:

  • Steps managed by LayoutManager:
    • Unpopulated – an empty dropzone with a call-to-action (“Upload a great image!”)
    • File dragged over window but not over dropzone (“Drop that file here!”)
    • File dragged over dropzone (“Drop it now!”)
    • File upload in progress (“Hang on, I’m sending it…”)
  • Step managed by component that implements LayoutManager:
    • File has uploaded and is populated. For our image uploader, this is a preview of the image with a “Remove” button.

The LayoutManager itself has little or no styles, and only displays visuals that have been passed as props. It’s responsible for maintaining which step in the process the user has reached and displaying some content for that step.

The only layout step that’s externally managed is “Preview” (whether the Uploader has an image populated). This is because the implementing component needs to define the state in which the uploader starts. For example, if the user has previously uploaded an image, we want to show that image when they return to the page.

Example of LayoutManager use:

    dropzoneElement={<DropzoneLayout />}
    windowDragDropzoneElement={<WindowDragDropzoneLayout />}
    dragDropzoneElement={<DragDropzoneLayout />}
    loadingElement={<LoadingLayout />}
    previewElement={<PreviewLayout file={file} />}



Resource-Specific Components

Recap of the React architecture illustration from above with "General Components" highlighted.


The ImageUploader component is geared almost entirely toward presentation; defining the look and feel of each step and passing them as props into an UploadLayoutManager. This is also a great place to do validation (file type, file size, etc).

Supporters of this tool can focus almost entirely on the visual look of the uploader. This component maintains very little logic since state transitions are handled by the UploaderLayoutManager. We can change the visuals fluidly with very little concern about damaging the function of the uploader.

Example ImageUploader:

const DropzoneLayout = () => (
    <p>Drag a file here or click to browse</p>
const DragDropzoneLayout = () => (
    <p>Drop file now!</p>
const LoadingLayout = () => (
    <p>Please wait, loading...</p>
const PreviewLayout = ({file, onRemove}) => (
        <p>Name: {}</p>
        <Button onClick={onRemove}>Remove file</Button>
class ImageUploader extends React.Component {
    state = {file: undefined};

    _handleRemove = () => this.setState({file: undefined});

    _handleReceiveFile = (file) => {

        return new Promise((resolve, reject) => {
            // upload the file!
        .catch(() => this.setState({file: undefined}))

    render() {
        let {file} = this.state;
        let preview;

        if (file) {
            preview = (
                <PreviewLayout file={file} onRemove={this._handleRemove} />

        return (
                dropzoneElement={<DropzoneLayout />}
                dragDropzoneElement={<DragDropzoneLayout />}
                loadingElement={<LoadingLayout />}

Application-Specific Layer

Recap of the React architecture illustration from above with "Application-specific layer" highlighted.

The example above has one prominent aspect that isn’t about presentation:  the file transport that happens in _handleReceiveFile. We want this ImageUploader component to live in our component library and be decoupled from API specific behavior, so we need to remove that. Thankfully, it’s as simple as accepting a function via props that returns a promise that resolves when upload is complete.

_handleReceiveFile(file) {
    // could do file validation here before transport. If file fails validation, return a rejected promise.
    let {uploadImage} = this.props;


    return uploadImage(file)
        .catch(() => this.setState({file: undefined}))

With this small change, this same image uploader can be used for a variety of applications. One part of your application can upload images directly to a third party (like Amazon S3), while another can upload to a local server for a totally different purpose and handling, but using the same visual presentation.

And now because all that complexity is compartmentalized into each component, the ImageUploader has a very clean implementation:

<ImageUploader uploadImage={S3ImageUploadApi} />

With this foundation applications can use this same ImageUploader in a variety of ways. We’ve provided the flexibility we want while keeping the API clean and simple. New wrappers can be built upon UploadLayoutManager to handle other file types or new layouts.

In Closing

Imagine image uploaders which were purpose-built for each scenario, but contain only a few simple components made of presentational markup.  They can each use the same upload functionality if it makes sense, but with a totally different presentation. Or flip that idea around, using the same uploader visuals but with totally different API interfaces.

In what other ways would you use these foundational components? What other uploaders would you build? The sky’s the limit if you take the time to build reusable components.

8 Reasons Why Manual Testing is Still Important

The increase of test automation adoption has unjustly framed manual testing as an archaic and unnecessary practice. After watching an automation suite swiftly execute an entirely library of test cases, it can be easy to tunnel vision on the great benefits of automation. However, the value of manually executing your tests cannot be understated; here are a few reasons why manual is still relevant as ever.

Tape 1: Cycle Times

There’s no way around it; initial automation requires an increased investment in both, time & resources. You are setting up a foundation to continually benefit from in your future testing endeavors. However, in some cases, your automation efforts will not be the ideal solution for your testing.  Attempting to initialize automation while close to the end of your testing cycle would be a moot effort; the time you take to set up (and the sudden resource shift) means you’ll be nearing your release date before you can start running reliable and core automated testing. During that same timeframe, you could be focusing your testing resources towards manual execution. As the majority of their time is focused on test case validation, the end result is more coverage within your test cycle.

Tape 2: Even Your Automation Has Errors

Like any piece of code, your automation will contain errors (and fail). An error filled automation script may be misinterpreted as failed functionality in your tested application, or (even worse) your automation script will interpret an error as a correct functionality. Manually testing your core, critical-path functionality ensures that your test case is passing from a user perspective, with no room for misinterpretation.

Tape 3: UI Validations

The advent of automated testing platforms for Responsive and UI testing has provided a much appreciated convenience. However, it should be a boost to your UI testing efforts, not a crutch. These programs validate your test cases by checking element distance, image placement, and alignment of elements in relation to each other. Because of this, there are more than a dozen ways that something such as alignment between a menu and logo can be misinterpreted; a manual tester would immediately be able to catch something that looked “off”, and fail the test case.

Tape 4: Un-Automatable Scenarios:

Some scenarios are simply not feasible to automate; they are either actually impossible due to technological limitation + the complexity of the scenario, or the resource cost of automating it greatly outweighs the cost of a simple manual test. Case in point, we recently had a customer who needed to test their manual tap-and-pay function for their mobile wallet app. Developing a way to automate this scenario is not worth it when compared to manually testing it with your device.

Tape 5: (Short-Term) Cost

Over time, automation leads to cost savings, faster execution, and continous testing. In the immediate short term however, there is an investment cost (and learning curve for the unfamiliar) that can be a situational disadvantage. The cost of setting up and running your initial automation framework can range anywhere from 5-15x the cost of your manual testing endeavors. And as discussed earlier, implementing automation while crunched for time towards the end of a test cycle will not allow you to enjoy automation’s full potential. Choosing to conduct manual testing at this stage provides an immediate, tangible result from your testing resources.

Tape 6: Exploratory Testing

Exploratory testing describes the process of freely testing the application for the purpose of finding defects can`t subsequently designing new test cases. Defects found through exploratory testing are often the results of testing complex scenarios that would not have been addressed through your predefined test cases. Having a foundation of core, repeatable tests automated will free up time to designate resources towards exploratory testing.

Tape 7: Skills

While the end result of Automation is ease, the set up of framework and development of scripts are no easy tasks. An effective automator has a foundation of programming skills, as well as an inherent understanding of test design. These skills are learned over years of experience in both QA and Development, and acquiring somebody with these specific skillsets (especially on short notice) is not a simple process. On the other hand, the majority of Manual test cases are simple to execute and can easily be taught; follow the steps in the test case, and validate that your actual results are consistent with the expected results.

Tape 8: Agile

In the context of Agile testing, automation is of great benefit. Having a library of tests reliably and quickly executable truly helps with test completion & coverage during a tight sprint. By that same token, manual testing is a quick way to execute for any test cases that are not yet automated. There may be no time to build automation for new features introduced in the current build, making manual the best option for test completion.

As a conclusion, the need for increased test coverage across an ever increasing range of software and devices has made test automation more important than ever. As automation continues to grow, it can be easy to forget about the wide spectrum of benefits manual testing still has to offer. Appreciating the value of both approaches will make for a wholesome testing experience.

Looking under the hood of the Eventbrite data pipeline!

Eventbrite’s mission is to bring the world together through live experiences. To achieve this goal, Eventbrite relies on data-driven decisions at every level. In this post, we explore Eventbrite’s treasure trove of data and how we leverage it to push the company forward. We also take a closer look at some of the data challenges we’ve faced and how we’re pushing forward with new improvements to address these challenges!

The Data Engineering team at Eventbrite ingests data from a number of different sources into a central repository.  This repository is the foundation for Eventbrite’s analytics platform. It empowers Eventbrite’s engineers, analysts, and data scientists to create data-driven solutions such as predictive analysis, algorithmic newsletters, tagging/clustering algorithms, high-value customer identification, and search/recommendations.

The Situation: Degrading performance and increasing cost of existing Data Warehouse running on Hadoop infrastructure (CDH5)

We use MySQL as our main production datastore, and it supported most of our data analytics/reporting until a few years ago. Then we implemented a Cloudera CDH ecosystem, starting with CDH3 and upgrading to CDH5 when it was released. Prior to phasing in our CDH environment, our main OLTP (online transaction processing) databases were also powering our reporting needs.

Our production MySQL topology consists of a Primary-Secondary setup, with the majority of the read-only traffic directed to the MySQL secondary instances. For many years, we met our reporting requirements by leveraging the read-only MySQL instances but it came at a steep price due to contention and performance issues caused by long-running SQL queries.

As a result, we moved much of our reporting to our new CDH environment, and we designed a new set of transformation tables to simplify the data access  for Engineers, Analysts and Business users. It’s served us well as the backbone for our Data Warehouse efforts, but the time had come to take the next step as we’ve faced a number of challenges:


Our CDH5 cluster lives on Reserved Instances, and all of the data in the cluster is housed on local solid state drives.  As a result, the cluster is expensive to maintain.

A Reserved Instance is a reservation of resources for an agreed upon period of time.  Unlike on-demand, when you purchase an RI (reserve instance), you commit to paying for all  the hours of the 1-year or 3-year term. The end result is a lower hourly rate, but the long term costs can really add up.


We have a large collection of uncurated data, and we had not transformed the data into a single source-of-truth about our business. As a result, core business metrics (such as organizer, consumer, and event data) were reported differently in different places in the organization, and attributes such as currency, location and timezone were reported differently across business units.


Most jobs were scheduled via Oozie, there was little effective monitoring in place, and there was no method to track or enforce dependencies between coordinators. In addition, other analytics jobs that utilize Salesforce and MySQL data were scheduled through a local Windows machine that was prone to errors and regularly failed without warning or notification.


All ETL-processing and  ad-hoc queries executed on the same CDH5 cluster. Each process had its own load profile, so the cluster was configured to fit an aggregate of those loads. The end result was that jobs frequently conflicted with each other and competed for resources.

Our workload required burst capacity to support experimental development, ad-hoc queries, and routine ingestion scripts. In an ideal setup, we would scale up and scale down computing resources without any interruptions or data loss.


For MySQL ingestion, we used a home-grown wrapper called Sqoozie to integrate with our MySQL databases. Sqoozie combines Apache Sqoop – a command-line application for transferring data between relational databases and Hadoop – and Apache Oozie, a Hadoop workflow scheduler. It allows for writing MySQL tables directly to Hive tables. While this approach worked for smaller datasets, it became prohibitive as our data grew. Unfortunately, it was setup as a full ingestion of all tables each day and typically took most of a day to finish, putting high load on the shared resource cluster for an extended period of time.

For web analytics ingestion, we used a proprietary tool called Blammo-Kafka that pulled the web logs directly from Kafka daily and dumped them to Hive tables partitioned by day.

For Salesforce ingestion, we used the Salesforce Bulk API to ingest all objects daily and overwrite the previous day’s ingestion.

The Solution: EMR, Presto, Hive, and Luigi to the rescue!

In the past year, we’ve invested heavily in building a shiny new “data-foundry” ecosystem to alleviate many of the pain points from our previous CDH environment. It is the result of many whiteboard sessions, sleepless nights, and walks around the block at Eventbrite’s offices at our SOMA location in San Francisco and Cummins Station in Nashville.

We focused not only on improving stability and cost, but also on designing a new set of transformation tables that would become the canonical source-of-truth at the company level. This involved meeting with key stakeholders to understand business metrics and exploring new technologies. The following diagram depicts sample output from some of our working sessions. As you can tell, it was a tedious process.

The end result was the implementation of a new “data-foundry” infrastructure. The following diagram shows a general layout:

EMR (Elastic MapReduce) Clusters

Ingestion and ETL jobs run on daily and hourly scheduled EMR clusters with access to most Hadoop tools. Amazon’s EMR is a managed cluster platform that simplifies running big data frameworks such as Hadoop, Spark, Presto, and other applications in the Apache/Hadoop stack.

The EMR/S3 solution decouples storage from compute. You only pay for compute when you use it (high utilization). Multiple EMR clusters can access the data (S3, Hive Metastore), and interactive workloads (Hive, Presto, Spark) can be launched via on-demand clusters.

We’ve seen some benefits with Amazon EMR:

Intelligent resizing

  • Incrementally scale up (add nodes to EMR cluster) based on available capacity
  • Wait for work to complete before resizing down (removing nodes from EMR cluster)
  • Can scale core nodes and HDFS as well as task nodes

Cost Savings

By moving to EMR and S3, we’ve been able to considerably cut costs. With S3 we pay only for the storage that we use, not for total capacity. And with EMR, we’re able to take advantage of  “on-demand” pricing, paying low hourly rates for clusters only when we need the capacity. Also, we’ve reduced the cost even further by purchasing Reserved Instances and bidding on Spot instances.

  • Use Amazon EC2 spot instances to save > 80%
  • Use Amazon EC2 Reserved Instances for steady workloads

Reliability/Improved Operational Support

Amazon EMR monitors nodes in each cluster and automatically terminates and replaces an instance if there is a failure. Plus the new environment has been built from scratch, is configured via Terraform, and uses automated Ansible templates.

Job Scheduling

We use Luigi to orchestrate our Python jobs. Luigi enables us to easily define task workflows without having to know much about other workflows. It is an open source Python framework created by Spotify for managing data processing jobs, and it is really good at dependency management, which makes it a perfect tool for coalescing dependent data sources.

Centralized Hive Metastore

We have a centralized Hive metastore that saves all the structure information of the various tables, columns, and partitions for our Hive metadata. We chose Hive for most of our Hadoop jobs primarily because the SQL interface is simple. It is much cleaner than listing files in a directory to determine what output exists, and is also much faster and consistent because it’s backed by MySQL/RDS. This is particularly important since we rely on S3, which is slow at listing files and is prone to “eventual” consistency issues.


We continue to ingest production data from MySQL tables on a daily basis using Apache Sqoop, but in the “data-foundry” ecosystem we ingest the tables incrementally using “changed” columns to allow for quicker updates.

We ingest web analytics data by using Pinterest Secor to dump data from Kafka to S3. We then process it from that S3 path using Spark, both hourly and daily. Hourly we  ingest the latest data for each web analytics table since the last time it was ingested and write it to Hive tables partitioned by day and hour. Daily we also ingest the web analytics data to day partitioned Hive tables.

We ingest Salesforce data using a combination of the Salesforce REST and Bulk APIs using custom internal built Python clients for both. Tables are ingested through Spark using the API that makes the most sense based on the size of the data. Also, where available, we use primary key chunking in the Bulk API to optimize ingestion of large tables.

In addition to the ingestion processes that bring us to feature parity with the old CDH5 infrastructure, we also ingest data from a few other sources, including Google Analytics and several other 3rd party services.

We ingest Google Analytics data three times a day for the current day and once for the previous day based on SLAs provided by Google. We use Spark in addition to Google’s BigQuery and Cloud Storage clients to ingest Google Analytics data for our mobile app, organizer app, and web app to Hive tables partitioned by day.


By separating analytics processing from visualization and queries, we’ve been able to explore more tooling options. Both Presto and Superset have proven to be useful.  

Presto is a distributed SQL query engine optimized for ad-hoc analysis. It supports the ANSI SQL standard, including complex queries, aggregations, and joins. Presto can run on multiple data sources, including Amazon S3. We’re using Presto with EC2 Auto Scaling Groups to dynamically scale based on usage patterns.

Presto’s execution framework is fundamentally different from that of Hive/MapReduce. It has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel, and it streams data from one stage to another as the data becomes available. This reduces end-to-end latency, and we’ve found Presto to be quite snappy for ad-hoc data exploration over large datasets.

An additional benefit is that Facebook and the open-source community are actively developing Presto, which has no vendor lock-in because it speaks ANSI-SQL.

Superset is a data exploration and visualization tool that was open sourced by Airbnb. It allows for fast and flexible data access and comes complete with a rich SQL IDE, which is used heavily by Eventbrite’s business analysts.


We’ve introduced a new set of staging tables in our data warehouse that transform the raw data into dimension tables aligned specifically to meet business requirements.  These new tables enable analytics, data science, and reporting. The goal is to create a single “source-of-truth” for company metrics and company business concepts.

Data Exports

The Data Engineering team has developed a set of exporter jobs in Python to push data to targets such as Redis, Elasticsearch, Amazon S3 or MySQL. This allows us to the cache the results of queries to power reports, so that the data is available to everyone, whenever it is needed.

What next?

We’re looking for new ways to decrease our ingestion times from MySQL using stream processing with products such as Maxwell (, which has been well-documented by Zendesk. Maxwell reads MySQL binlogs and writes row updates to Kafka as JSON. We’re also using SparkSQL and excited to use Apache Spark more broadly, especially Spark streaming.

We have a ton of enhancement requests to extend our Data Warehouse tables to meet the growing needs of the business and to provide better ways of visualizing the data via new Tableau dashboards.

As the Eventbrite family continues to grow with the acquisitions of Ticketscript, Ticketfly, and Ticketea, we continue to explore ways to migrate/combine data sources. This includes ingesting data from sources new to us, such as Amazon Redshift and Amazon Dynamo.

It’s fun times here at Eventbrite!

Special thanks to Eventbrite’s Data Engineering team: (Brandon Hamric, Alex Meyer, Will Gaggioli, Beck Cronin-Dixon, Jasper Groot, Jeremy Bakker, and Paul Edwards) for their contributions to this blog post. This team rocks!

Be the change

Since this is our first post on our blog in Spanish (this article is a translation of Ser el Cambio), we wanted to start off with a bang by featuring one of our most challenging projects we’re facing as a company. Though our offices might not be new, our engineering team is constantly growing, and one of our primary resolutions this year is that our team continue to grow from life experiences, different cultures, and, above all, achieve a balance in terms of gender representation.

Our goal is clear… More women in engineering! However, when we sat down to try and figure out how we would achieve this goal, we found that it was a whole lot more than just looking for women to submit their resumes. We discovered that the industry offers little-to-no support for women, leaving them little room to grow and practically no voice when it comes time to make important decisions.

Faced with all of this, we realized that we didn’t want to be a business that just simply tried to put more women to work. In order to tackle this issue more head-on, we developed a working group that we chose to call #ada-lovelace in honor of the great scientist and role model who served as the face for the representation of women in computer science. The idea of this group is to dream up and create a working environment in which all women feel safe, represented, and fully integrated into our business.

The group is made up of men and women who understand that women’s presence and opinions are necessary and important within our company and, above all, within every level of the field of engineering. During most of 2017 and the first part of 2018, this group has taken on the following tasks:

  • Create a space where current and future mothers within our company can feed their babies in total privacy without the fear of being observed.
  • Promote and sponsor groups dedicated to educating women in science and technology, such as Django Girls and Agile Woman.
  • Create confidential working groups within the company where concerns over day-to-day work and workplace issues can be shared in a respectful environment.
  • Ensure that interview panels have women on them for positions of all levels and roles. The idea behind this is that the hiring process shouldn’t be segregated by gender, rather, than all interviews should be focused on job-relevant abilities.
  • Creation of the first “seedlings of engineers” within our company, which will be a school for up and coming professionals that may or may not already have professional experience. Within this group, we will strive for a certain percentage representation of women in order to promote their entry into the workforce.

As part of this projects, we held interviews with women that are part of our engineering team in order to find out what had made them consider Eventbrite as a possible workplace (Interview in Spanish):

We are fully aware that this is an on-going effort, and we must continue to make progress in order to solve this problem, which not only affects us as a company, but the industry, and society as a whole.. It’s a challenge in and of itself to identify the problems that cause the gender gap to form within the realm of education and training, which then extend to the workplace and the positions that women one day might find themselves in.

You might ask yourselves, Can they pull this off? We’re on the way. Like all big changes, we need time to generate results and see the overarching benefits of this work; however, we’ve begun with the first step in the right direction.

“Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has.”

– Margaret Mead

Wrote in Spanish by Natalia Cortese

Translated into English and published by Melisa Piccinetti

Reviewed translation by Sebastian Torres

Doctor Python: Or How I Learned to Stop Worrying and Love ES6

Have you learned ES6 yet? Oof. When people started asking me that, I’d feel a sense of pressure that I was missing out on something. What was this “ECMA” people kept talking about? I was worried.

But Python helped me learn ES6. Weird, right? Turns out a lot of ES6 syntax overlaps with that of Python, which I learned at Eventbrite. Much of the syntax is shared between the two languages, so they kind of go hand in hand. Kinda.

Without further ado, let’s talk about these two buddies.


Block Scope

When I first started learning JavaScript (back in “ancient” ES5 days), I assumed several things created scope. I thought that conditionals created scope and was quickly told that I was wrong.

“NO. Only functions create scope in JavaScript!”

So when I found out that with ES6, we now have block scope, I was like, “WAT”.

A massive inflatable rubber ducky floating in front of a pier and building.

With the addition of const and let to ES6, block scope! Wow! I felt like I’d predicted the future.

function simpleExample(value) {
  if (value) {
    var varValue = value;
    let letValue = value;
    console.log(varValue, letValue); // value value

  // varValue is available even though it was defined
  // in if-block because it was "hoisted" to function scope
  console.log(varValue); // value

  // letValue is a ReferenceError because 
  // it was defined within the if-block
  console.log(letValue); // Uncaught ReferenceError: letValue is not defined

What else creates scope in JavaScript, ES6, and Python? And what kind of scope do they use? Check out the following table:

JavaScript Python
Scope Lexical Lexical
Namespace Functions, Classes [ES6!], Modules [ES6!], Blocks [ES6!] Functions, Classes, Modules
New Identifiers Variables, Functions Variables, Functions, Classes

Template Literals

I like to think of template literals as Mad Libs. Did you have them as a child? Sentences were missing words, and you could write anything you wanted into those spaces. You only had to conform to the specified word type: noun, pronoun, verb, adjective, exclamation.

Mad Libs that read "mothers sit around burmping. Last summer, my little brother fell in a/an hairdo and got poison palmtree all over his butt. My family is going to Winsconsin, and I will.."

Similarly, template literals are string literals that allow embedded expressions. They were originally called “template strings” in prior editions of the ES2015 specification.

Yup, these already exist in Python. I had actually learned about literal string interpolation in Python, which made it that much easier for me to understand in ES6. They are great because you no longer need the ridiculous concatenation found in older versions of JavaScript.

let exclamation = 'Whoa!';
let sentence = `They are really similar to Python.`;

console.log(`Template Literals: ${exclamation} ${sentence}`);
// Template Literals: Whoa! They are really similar to Python.
print '.format(): {} {}'.format('Yup.', 'Quite!')
# .format(): Yup. Quite!


Default Parameters

Yup. Python’s got ‘em too. Default parameters set a default for function parameters. This is most effective for avoiding bugs that pop up with missing arguments.

function nom(food="ice cream") {
  console.log(`Time to eat ${food}`);

nom(); // Time to eat ice cream
def nom(food="ice cream"):
  print 'Time to eat {}'.format(food)

nom() # Time to eat ice cream

Rest Parameters & *args

Rest parameter syntax allows us to represent an indefinite number of arguments as an array. In Python, they’re called *args, which again, I’d already learned! Are you sensing a pattern here?

Check out how each of the languages bundles parameters up in neat little packages:

function joke(question, ...phrases) {
  for (let i = 0; i > phrases.length; i++) {

let es6Joke = "Why does JS single out one parameter?"
joke(es6Joke, "Because it doesn't", 'really like', 'all the REST of them!');

// Why does JS single out one parameter?
// Because it doesn't
// really like
// all the REST of them!
def pirate_joke(question, *args):
  print question
  for arg in args:
    print arg

python_joke = "What's a Pyrate's favorite parameter?"

pirate_joke(python_joke, "*args!", "*arrgs!", "*arrrgs!")

# What's a Pyrate's favorite parameter?
# *args!
# *arrgs!
# *arrrgs!



Oh boy, we’re gonna talk about prototypal inheritance now! ES6 classes are actually syntactic sugar and based on the prototype chain found in ES5 and previous iterations of JavaScript. So, what we can do with ES6 classes is not much different from what we do with ES5 prototypes.

Python has classes built in, allowing for quick and easy Object Oriented Programming (Python is down with OOP.). I always found the prototype chain extremely confusing in JavaScript, but looking at Python and ES6 classes side by side really hit home for me.

Let’s take a look at these ES6 “classes” based on the prototype chain:

class Mammal {
  constructor() {
    this.neocortex = true;

class Cat extends Mammal {
  constructor(name, years) {
    super(); = name;
    this.years = years;

  eat(food) {
    console.log('nom ' + food);

let fryCat = new Cat('Fry', 7);'steak');
class Mammal(object):
  neo_cortex = True

class Cat(Mammal):
  def __init__(self, name, years): = name
    self.years = years

  def eat(food):
    print 'nom %s' % (food)

fry_cat = Cat('Fry', 7)'steak')

A big difference between ES6 Classes and ES5 Prototypes: you can inherit more easily with classes than with the prototype chain. This is very similar to Python’s structure. Neato!

So there you have it. Five quick examples of Doctor Python helping me stop worrying and love ES6. It’s been many months now, and my ES6 usage is now pretty explosive.

Screen capture of Major Kong riding on top of a bomb falling from a plane in the film, Doctor Stangelove.

Mother May I?

Important announcement about updates to API V3 and evolving permissions at Eventbrite

If you want to skip straight to the content on the changes that will impact our API developers please visit our Google Group and read the message pinned to the top.

Permissions have been an ever-growing challenge at Eventbrite as we have grown over the years. With scale, permission management has become difficult because of the storage requirements, speed, and latency. Imagine a feature where you need to check the permissions for ten users of an account and 100 of the account’s events. Now take into consideration that each individual event can have multiple permissions associated with it. You can start to get an idea of both the storage requirements and the speed considerations. Even if each permissions check is very fast, executing all of them serially will become slow.

In late 2016 we began working on a Python Library that will change the way we grant users permission to the various entities that exist within Eventbrite. Essentially this library aims to answer the following questions:

    • Can User X take action Y on entity Z

Entity can be event, order, etc.

    • What Users can take action Y on entity Z
    • What entities can User X take action Y on

When the library is invoked it is given parameters which indicate the entities/users are relevant *before* checking permissions. If there are 10 events and 5 users you care about, you load permissions for all these events/users doing a one-time SOA call. Once you have loaded permissions this way, checking permissions is very fast because it’s all in memory. Permissions are inferred in a couple of ways. Inference means that not every single permission is stored in the database.

More general permissions imply more specific ones.

If a User has full permissions on “event” that infers that that user has permissions to do more specific things to that event (change details, add tickets, etc.)  These permissions are not stored, but are inferred.

Permissions on a higher object infer permissions on a lower one:

If a User has full permissions on a “user” that infers that that user has permissions on all the events that are owned by that user.

This allows you to do something like “grant event detail editing” to User Y on User Z. The effect is that User Y can edit the event details for any event owned by user Z.

These can be used powerfully in combination. If you say “grant the event privilege to User Y on User Z” it means that User Y can now do anything to any event owned by User Z.

Inference greatly simplifies operations such as changing the role of a user or bulk deleting events. If we were to store each permission separately, we could potentially need to update thousands of rows in the database to accomplish this one task.

Most recently the permissions library was used as one part of an update we are making to our product offering. Starting this fall, this update will allow organizers to choose between three tailored product packages to best achieve their goals. You can learn more about the new Eventbrite packages here.The rollout will affect our API developers. If you are an API developer please visit our Google Group for detailed information on how your apps will be impacted and how you can prepare.

Photo by Christian Wiediger on Unsplash

Britecharts v2.0 Released

Britecharts, Eventbrite’s D3 based charting library, has grown with additional charts contributed by the community. It is now a mature library, but it still lacks some charts used in today’s standard DataViz suites. We want to add these charts, and that means we will experience some growing pains. We wondered, how we could make that process easier? Continue reading

Packaging and Releasing Private Python Code (Pt.1)

When dealing with a large Python code base managed by multiple teams, you often find that you need to be able to package and release this code independently. Most best-practices guides for releasing Python packages focus on public packages, and do not cover complex dependencies. In this post I’ll focus on how we, at Eventbrite, release our internal Python packages and avoid dependency hell while doing so. This first part will cover defining packages and their dependencies, while the second part will cover building and distributing Python wheels internally.

Continue reading