Cowboys and Consultants Don’t Need Unit Tests

As a developer, my understanding and respect for software testing has been slow coming because in my previous work I have been an engineer and a consultant, and in these roles it wasn’t yet obvious how important testing really is. But over the past year I have finally gained an appropriate respect and appreciation for testing; and it’s even improving the way I write code. In this post I will explain where I’ve come from and how far I’ve traveled in my testing practices. I’ll then list out some of the more important principles I’ve picked up along the way.

Engineers are cowboys … and cowboys don’t need no stinkin’ tests.
I got my start as an Aerospace engineer. And as an engineer, if you do any programming at all, testing is probably not part of it. Why? Because engineers are cowboy coders. As engineering students, we are taught just enough to implement whatever algorithm we have in mind, make some pretty graphs, and then we graduate.

It wasn’t much better at my first job. I had shown an interest in software development and so, in one particular project, I was given the task or reworking and improving the project codebase. We were developing autonomous aircraft control algorithms and it soon became apparent that after months of work, no one had thought to run the simulation using different starting conditions. After finally trying different starting conditions we found that our control system was generally better at crashing the plane rather than flying it. This should have been the biggest hint in my early career that testing might be important. But it would still be quite a while before I learned that lesson.

Continue reading

Isomorphic React Sans Node

React is JavaScript library for building user interfaces that has taken the web development industry by storm. Its declarative syntax and DOM abstraction for components not only make client-side development simple, but also enables server-side rendering of those same components, which enables improved SEO and initial browser load time. But how do you render JavaScript React components server-side if your backend doesn’t run on Node? Learn how Eventbrite successfully integrated React with their Python/Django backend so that you can do the same in yours.

React + ES.next = ❤

JavaScript is evolving quickly. The ES6 specification was released in 2015 and is quickly being implemented by modern browsers. New versions of ECMAScript will now be released on a yearly basis. We can leverage ES6 and functionality slated for future versions right now to write even clearer and more concise React code.

Experience with React will help you get the most out of this session, but you don’t have to have a JavaScript black belt to leave feeling confident in using ES6 with React. Senior Front-End Engineer Ben Ilegbodu covers how to write cleaner code using the new spread operator, classes, modules, destructuring, and other tasty syntactic sugar features being introduced into ECMAScript. Oh, and don’t worry if you don’t understand all of those terms — you soon will after this video.

The Elevator Pitch from a Data Strategist

When people asked what I do for a living at conferences or parties, I told them I run data strategy. Their first response was “oh, that’s cool”. Then they paused for a moment and asked “what do you do exactly?”

After spending fifteen minutes explaining all the aspects of my job, I either totally confused my audience or bored them to death.

So I set out to develop an elevator pitch, something as punchy as “I am a photographer who specializes in marine life”. I thought I could get some help from online job postings. Searching “data strategy” on LinkedIn returned 84 listings. Few of them described what I do. By contrast, the search on “data scientist” returned 40 times more results.

I was not hired per a job description. I was lucky to convince Eventbrite to create the role for me.

My argument was pretty simple: think of all the data-related challenges the company faces, how many of them are technical, how many are organizational?

Most data-driven organizations have the following data pipeline.
Data Pipeline

These functions are owned by different groups. Many problems arise from the lack of cross-function understanding of data. For example, analysts complain about the variations of a single metric. They feel they have little influence over the engineering team to impose consistent tagging. Engineers, on the other hand, complain about analysts’ incomprehension of the technical complicity.

Such friction can be reduced by someone who operates on the “full stack” – someone who has the domain knowledge in each area, as well as the organizational skill to connect the dots. Data strategist is one of those people.

Data strategy development reverses the data pipeline. It starts by asking what data and insight are critical to the business’s short, medium, and long-term growth. Though the cost of storing and processing data gets cheaper, the cost of generating insights increases as more and more operations become data driven. Data strategy should be aligned with the business strategy, prioritized to address the biggest opportunities and highest risks. First, data strategist thinks like a business operator.

Business opportunity for a web-based eCommerce company can be developing mobile user experiences (both mobile web and app). Increasing competition from mobile-only competitors poses significant risk. Data strategist, together with engineering and analytics, decides on the success metrics to measure the new mobile products. The metrics are then translated into data needs; and data needs into technical requirements for tracking. Data strategist can help to enforce the consistency across devices, so user experiences on different platforms can be compared side by side. Data strategist, secondly, is an analyst. She or he understands what questions to ask; and how data becomes metrics and then insights.

Once the mobile products are available, marketing decides to launch a few mobile campaigns. One of the campaigns is a banner on the homepage, encouraging web users to download the app. Marketing would like to know the effectiveness of this campaign compared to another campaign they purchased from an app ad network. This request poses some real technical challenges. The traditional cookie based web tracking no long applies to app. Data strategist works with engineering to explore the technical workarounds, and evaluates external tools that may provide a solution. She or he also partners with legal to make sure the privacy policy is updated. Data strategist, thirdly, is a technologist. The technical knowhow helps to make tradeoffs; avoiding making data collection a burden on the engineering resources.

Multiple functions, competing priorities, tight resources, various teams touching one data point…data strategy cannot operate unless cross-function processes are established. A company-wide data roadmap is an example of such process. Data strategist, fourthly, is an organizer.

Lastly, data strategist is a communicator. She or he is the broker of domain knowledge from one function to another. She or he funnels the downstream constraints to the decision makers; and translates the overall strategy down to individual data stakeholders. She or he champions for the investment in data infrastructure and personnel. When issues arise, she or he assembles a team and coordinates the efforts. Best practices are shared, training held to distribute the understanding of data evenly across the organization.

If I have to summarize all these responsibilities into one sentence, I’d say my job is to help organizations managing their data assets, and finding the best ways to surface insights from the data. Still not as punchy as the underwater photographer. For the fellow data strategist out there, if you have a better version I would love to hear it!

Engineering + Accounting for Marketplace Businesses

Eventbrite Principal Product Manager Ryan D’Silva and Chief Architect Adam Sussman cover how there’s a deep product need where engineering and finance meet, particularly if you’re a marketplace. While there are solutions available, none do the job particularly well and most marketplaces have built their own solutions at great cost. We’d like to shed some light on the problem and share what we’ve learned so far.

Learning ES6: Generators as Iterators

electric-generator

I feel like all the articles in the Learning ES6 series have been leading up to generators. They really are the feature most JavaScript developers are excited about in ECMAScript 6. They very well may be the future of asynchronous programming in JavaScript. That’s definitely something to get excited about!

Generators can be used both as data producers and data consumers. In this post, we’re going to look at how generator functions are a much more convenient way to produce data and and create iterators. It’s the simpler way to use generators. In the last article we covered iterators & iterables, so you may need to familiarize yourself with that before looking at generators as iterators.

TL;DR

A generator function is a special type of function that when invoked automatically generates a special iterator, called a generator object. Generator functions are indicated by function* and make use of the yield operator to indicate the value to return for each successive call to .next() on the generator.

function* range(start, count) {
    for (let delta = 0; delta < count; delta++) {
        yield start + delta;
    }
}

for (let teenageYear of range(13, 7)) {
    console.log(`Teenage angst @ ${teenageYear}!`);
}

Feel free to clone the Learning ES6 Github repo and take a look at the generators code examples page showing them off in greater detail.

With out further ado, let’s keep reading.

Quick overview

Generator functions are a new type of function in ES6 that are indicated by function* and return a generator object (which is a specific type of iterator). The heart of a generator function is the yield operator that pauses execution within the generator function:

function* generatorFunc() {
    console.log('before yield');
    yield;
    console.log('after yield');
}

let generator = generatorFunc();

// nothing has happened yet, just have a generator

// output:
// before yield
// {value: undefined, done: false}
console.log(generator.next());

// this will be executed before 'after yield'
// is written to the log
console.log('after first next');

// Output:
// after yield
// {value: undefined, done: true}
console.log(generator.next());

// additional calls to .next() do nothing

// Output:
// {value: undefined, done: false}
console.log(generator.next());

As you can see, calling generatorFunc() doesn’t execute the function. It just returns a generator object which we assign to generator. It’s generator that will allow us to control generatorFunc’s execution. Before calling generator.next(), generatorFunc is kind of in a holding pattern at the beginning of its function body. It’s not until .next() is called that execution begins and continues until the first yield. Everything prior to that first yield is executed (so 'before yield' is logged to console).

An object with a value is returned, but that value is undefined because we haven’t provided an operand for yield. We’ll talk more about yielding values in a bit so it’s ok if that doesn’t make too much sense just yet.

generatorFunction() is now paused at the yield line, right in the middle of the function. Execution now returns back to the main program where 'after first next' is logged to the console.

The subsequent call to generator.next() continues execution in the generator function. 'after yield' is now logged to the console and the function finishes. The call to .next() returns another object, this time with done set to true. Any additional calls to generator.next() have no effect.

This is the barebones of how generator functions work. A generator object can be created in four ways…

From a generator function declaration (same as example above):

function* generatorFunc() {
    yield;
}
let generator = generatorFunc();

From a generator function expression:

const generatorFunc = function*() {
    yield;
}
let generator = generatorFunc();

From a generator method definition in an object literal:

let someObj = {
    *generatorFunc() {
        yield;
    }
};
let generator = someObj.generatorFunc();

From a generator method definition in a class definition (declaration or expression):

class SomeClass {
    *generatorFunc() {
        yield;
    }
}
let someObj = new SomeClass();
let generator = someObj.generatorFunc();

The most basic form of a generator object acts as a data producer, aka an iterator. It returns a value for each iteration. If you haven’t had a chance to read up on iterators & iterables, you probably should do that first. Everything we’ll cover in this section builds upon that knowledge.

As the article on iterators mentioned, we most likely won’t be implementing iterators directly because of generators. Generator functions make it dead simple to create iterators (although understanding them isn’t quite so simple). All we have to do is write the looping behavior because all generators have built-in implementations for .next() and [Symbol.iterator](). This makes generators both iterators as well as iterables. As a refresher, here’s the iterable interface written using TypeScript:

interface Iterable {
    // default iterator
    [System.iterator]() : Iterator;
}
interface Iterator {
    // next method to continue iteration
    next() : IteratorResult;

    // optional return method
    return?(value? : any) : IteratorResult;
}
interface IteratorResult {
    value : any;
    done : boolean;
}

Creating a generator

So how much easier is it to create a generator versus a plain iterator? Well let’s adapt an example from Jason Orendorff’s ES6 In Depth: Generators blog post. Let’s say we want a range() function that will return an iterator that will iterate from the specified start for a specified count number of times:

class RangeIterator {
    constructor(start, count) {
        this.start = start;
        this.count = count;
        this.delta = -1;
    }

    [Symbol.iterator]() { return this; }

    next() {
        this.delta++;

        let value = this.start + this.delta;

        if (value < this.start + this.count) {
            return {value}; // using object literal shorthand
        }
        else {
            return {done: true};
        }
    }
}

// Return a new iterator that will iterate from `start` for
// `count` number of times
function range(start, count) {
    return new RangeIterator(start, count);
}

for (let teenageYear of range(13, 7)) {
    console.log(`Teenage angst @ ${teenageYear}!`);
}

This isn’t overly complicated, but there sure is a whole lot of boilerplate to implement the RangeIteartor. Generators to the rescue!

// Return a new generator that will iterate from `start` for
// `count` number of times
function* range(start, count) {
    for (let delta = 0; delta < count; delta++) {
        yield start + delta;
    }
}

for (let teenageYear of range(13, 7)) {
    console.log(`Teenage angst @ ${teenageYear}!`);
}

Wow! We just spent many articles going through the syntactic sugar features of ES6 learning how it made our code more succinct. But we just used the new generator functionality to replace 25+ lines of code with only 4! We no longer have to define the RangeIterator class because generator functions automatically create the class for us. And the best part of the generator function implementation is that we get to avoid the weirdness of RangeIterator where it describes the functionality of a loop without using any loop syntax. It has to use state variables (this.start, this.count & this.index) to manage the looping behavior across multiple calls to .next(). Generators are much better.

Consuming a generator

In the article on iterators & iterators, we looked at consumers of iterators. Those same consumers work with generators as well since they are in fact iterators. Let’s look at the different ways we can consume the generator created by the following function:

function* awesomeGeneratorFunc() {
    console.log('start');

    console.log('first yield');
    yield 'Generators';

    console.log('second yield');
    yield 'are';

    console.log('third yield');
    yield 'cool!';

    console.log('all done!');

    return 1000;
}

Consuming a generator manually

As we saw earlier in the quick overview, we can manually consume a generator by calling .next() on it:

let generatorObj = awesomeGeneratorFunc();

// output:
// start
// first yield
// {value: 'Generators', done: false}
console.log(generatorObj.next());

// output:
// second yield
// {value: 'are', done: false}
console.log(generatorObj.next());

// output:
// third yield
// {value: 'awesome!', done: false}
console.log(generatorObj.next());

// output:
// all done!
// {value: 1000, done: true}
console.log(generatorObj.next());

// output:
// {value: undefined, done: true}
console.log(generatorObj.next());

// output:
// {value: undefined, done: true}
console.log(generatorObj.next());

Manually consuming a generator shows the pausing nature of generator functions. We’re just successively calling .next() to keep the example simple, but we could do a whole host of things in between calls to .next() and the generator function would stay “suspended” until a subsequent call to .next().

The only thing really new in this example is that awesomeGeneratorFunc() actually returns a value. But 1000 is not what is assigned to generatorObj; it is still a generator object. 1000 gets set as the value when the generator is done for the first time ({value: 1000, done: true}). Subsequent calls to .next() return an undefined value when done is true. We’ll look at the use case for this return value later on when we look at yield*.

Consuming a generator with a for-of loop

Even though our generator doesn’t actually do any looping (like the range() function from before) it can still be consumed by a for-of loop:

let generatorObj = awesomeGeneratorFunc();

// output:
// start
// first yield
// value: "Generators"
// second yield
// value: "are"
// third yield
// value: "awesome!"
// all done!
for (let word of generatorObj) {
    console.log(`value: "${word}"`);
}

The for-of operator calls .next() on the generatorObj automatically and assigns the value property to word. We see here that for-of consumes the generator until the generator is completed ({done: true}) and then it stops looping. However it doesn’t utilize the 1000 return value at all. It’s also worth pointing out that if we had a break in the loop, the generator never would’ve completed.

Consuming a generator with destructuring

By now you should be familiar with destructuring. If you aren’t, take a look at the destructuring blog post to ramp up. We can use destructuring to consume part of the generator values:

let generatorObj = awesomeGeneratorFunc();

// output:
// start
// first yield
// second yield
let [firstValue, secondValue] = generatorObj;

// output: 'Generators'
console.log(firstValue);

// output: 'are'
console.log(secondValue);

With destructuring we don’t have to consume the entire generator. We can just pull out the values that we care about. In this case we’re only pulling out the first two values so the generator only calls .next() twice. We never see 'third yield' written to the log, proving that the generator is indeed lazy just like iterators.

Consuming a generator with the spread operator

We’ve already learned that we can use the spread operator as a shorthand for converting any iterable into an Array object. A generator object is an iterable too!

let generatorObj = awesomeGeneratorFunc();
let generatedArray = [...generatorObj];

// output:
// start
// first yield
// second yield
// third yield
// all done!
// ['Generators', 'are', 'awesome!']
console.log(generatedArray);

As we can see, the spread operator consumes until completion in order to create the array. Using the new Array.from() static method would also have the same effect and results. We can then utilize all of the methods on an Array object (like .forEach, .map, etc.).

Generator recursion with yield*

Head hurting yet? If not, it definitely will as we start talking about yield*!

There will be times when we want to combine the values of one or more generators into a single one. Or, we want to factor out generator logic into a separate function so that it can be used multiple times. With “regular” programming we would just create the factored out function and call it as needed.

However, it’s not as simple in generator land. We don’t want to call the helper generator function to get back its return value. If we did that, we’d just get a new generator for that helper generator function. We actually want it to continue to yield into our current generator. What we want to do is delegate the generator’s population to another generator function.

And we can do this using yield*:

function* delegatedGeneratorFunc(start) {
    // yield the first item in the generator
    yield 'before';

    // delegate yielding to `awesomeGeneratorFunc()` which will add
    // 3 more items
    yield* awesomeGeneratorFunc();

    // yield 5th item
    yield 'between';

    // delegate yielding to `range()` which will add 5 items
    // we can pass parameters/variables just like regular functions
    // without `yield*` we'd just get back a new range generator
    // with only `yield`, the generator would be added as 10th item
    yield* range(start, 5);

    // yield 11th and final item
    yield 'after';
}

// quickly see contents of generator by converting to an array
// output:
// ['before', 'Generators', 'area', 'awesome', 'between', 1, 2, 3, 4, 5, 'after']
console.log([...delegatedGeneratorFunc(1)]);

As you can see, when we call delegatedGeneratorFunc() we end up with a generator that will iterate over 11 items even though only 3 were actually added directly within the function. The other 8 were delegated via yield*: three from awesomeGeneratorFunc() and five from range(). yield* iterates over the generator object returned by the delegated generator functions and then adds them as items to the main generator object.

If you picture a normal generator function building up an array instead of a generator, you can think of yield as calling .push() on an array. If we continue this analogy further, calling yield* is like calling .splice() to add multiple items to the array.

As it turns out, yield* isn’t just a generator function delegator. The operand of yield* (i.e. the value to the right) doesn’t have to be a generator object. It can be any iterable.

function* iterableGeneratorFunc() {
    yield 'adios';
    yield* 'hello';  // a string is an iterable!
    yield 'au revoir';
}

// quickly see contents of generator by converting to an array
// output: ['adios', 'h', 'e', 'l', 'l', 'o', 'au revoir']
console.log([...iterableGeneratorFunc()]);

Basically yield* is iterating over the values of an iterable for us and then yielding those values individually. We can more or less mimic yield* using for-of:

function* iterableGeneratorFunc() {
    yield 'adios';

    for (let value of 'hello') {
        yield char;
    }

    yield 'au revoir';
}

One other cool thing about yield* is that it’s one of the few built-in language constructs that uses the value that’s included when an iterator is done. As we saw earlier with awesomeGeneratorFunc() the value returned when a generator is done is specified via return in the generator function. In the case of awesomeGeneratorFunc() it returns the value 1000. Let’s create a generator function that will use the 1000 return value from awesomeGeneratorFunc to help initialize the range() generator.

function* delegatedGeneratorFuncV2() {
    // we're still including the 3 items yielded by awesomeGeneratorFunc(),
    // but we're also saving the return value in a variable
    let start = yield* awesomeGeneratorFunc();

    // we can now use that variable to initialize range()
    yield* range(start, 3);
}

// output: ['Generators', 'are', 'awesome', 1000, 1001, 1002]
console.log([...delegatedGeneratorFuncV2()]);

Let’s wrap up our learnings on yield* with a more concrete example to show the power of generators. It’s a binary tree example taking from the Generators chapter of Axel Rauschmayer’s Exploring ES6 book.

class BinaryTree {
    constructor(value, left, right) {
        this.value = value;
        this.left = left;
        this.right = right;
    }

    // default `@@iterator` is a generator function so
    // it needs the `*`
    *[Symbol.iterator]() {
        if (this.left) {
            yield* this.left;
        }

        // Let's do infix/in-order iteration
        yield this.value;

        if (this.right) {
            yield* this.right;
        }
    }
}

let tree = new BinaryTree(4,
    new BinaryTree(2,
        new BinaryTree(1),
        new BinaryTree(3)),
    new BinaryTree(5));

// output: [1, 2, 3, 4, 5]
console.log([...tree]);

Now, I don’t have time to explain binary tree traversal with recursion. Chances are you’ve had to write it on a whiteboard during an interview. �� You can Google it if you’re unfamiliar. But doing this sort of recursion in a manually-created iterator would be pretty complicated. Using generators and yield* makes it just as simple as the normal recursive solution would be.

In this example we made a BinaryTree object an iterable by giving it a [Symbol.iterator]() method. We need to prefix the method with * because our implementation is using yield and yield* to return a generator object. Also, because BinaryTree is iterable, we can use yield* to recursively get all of the items in a subtree (this.left or this.right) and add them to the main generator object. And this is all done lazily so the depth-first recursion only goes as deep as the generator is iterated. In this example we’re converting the iterable tree into an array, so we end up traversing the entire tree.

Putting it all together

Ok, we’ve spent a lot of time learning about how generators can be used as iterators. We’ve looked at a lot of simple, dummy examples to help us grasp the underlying concepts without too much logic around it. But in the real-world, our code is primarily logic because we’re trying to accomplish a real task. So let’s try to put what we’ve learned together into something we’re more likely to do on a regular basis.

Let’s mimic underscore or lodash. They both have functions that operate on arrays to map, filter, take, etc. They both have a _.chain() method which allows for chaining these functions without creating throwaway intermediary objects. We want to build something similar. However, we’re going to leverage the power of generators so that we don’t have to have realized arrays. Instead, using generators, we can preform these operations on infinite sequences in a lazy manner.

// Enumerable class that wraps an iterator exposing methods
// to lazily transform the items
class Enumerable {
    constructor(iterator) {
        // assuming iterator is some sort of iterable
        this._iterator = iterator;
    }

    *[Symbol.iterator]() {
        yield* this._iterator;
    }

    // Static (and private) helper generator functions
    static *_filter(iterator, predicate) {
        for (let value of iterator) {
            if (predicate(value)) {
                yield value;
            }
        }
    }
    static *_map(iterator, mapperFunc) {
        for (let value of iterator) {
            yield mapperFunc(value);
        }
    }
    static *_take(iterator, count) {
        let index = -1;
        for (let value of iterator) {
            if (++index >= count) {
                break;
            }

            yield value;
        }
    }

    // Instance methods wrapping functional helpers which allow for chaining
    // The existing iterator is transformed by the helper generator function.
    // The operations haven't actually happened yet, just the "instructions"
    filter(predicate) {
        this._iterator = Enumerable._filter(this._iterator, predicate);
        return this;
    }
    map(mapper) {
        this._iterator = Enumerable._map(this._iterator, mapper);
        return this;
    }
    take(count) {
        this._iterator = Enumerable._take(this._iterator, count);
        return this;
    }
}

function generateStocks() {
    // Returns an infinite generator that keeps on returning new stocks
    function* _generate() {
        for (let stockNo = 1; ; stockNo++) {
            let stockInfo = {
                name: `Stock #${stockNo}`,
                price: +(Math.random() * 100).toFixed(2)
            };

            console.log('Generated stock info', stockInfo);

            yield stockInfo;
        }
    }

    return new Enumerable(_generate());
}

let enumerable = generateStocks()
    .filter((stockInfo) => stockInfo.price > 30)
    .map((stockInfo) => `${stockInfo.name} ($${stockInfo.price})`)
    .take(5);

// Even though `_generate()` is an infinite generator, it's also lazy so
// we only look at enough stocks that are > 30 until we get 5 of them
console.log([...enumerable]);

We’ve basically implemented a (small) portion of lazy.js or RxJs using generators. Congratulations! We’re taking an infinite list of stocks, filtering by the ones that cost over $30, mapping each of those stocks to a display name, and then taking the first 5. Finally we convert that resultant iterator/generator into an array, which we log to the console.

The cool thing about it is that it’s lazy. It obviously doesn’t create the infinite list of stocks, otherwise it would crash. Instead it only creates enough stocks to get 5 that are over $30. If you run the code, you’ll see that you get less than a dozen 'Generated stock info' log messages.

The best way to understand how this all works is to work backwards.

Let’s start with .take() (and *_take()). As long as we haven’t gotten to count it yields the value from the iterator. Each iteration in the for-of loop retrieves the next value from its iterator. But that iterator is actually a generator from .map() (and *_map()). So the first value in the for-of loop within *_take() is actually the first value yielded by *_map(), the second value is the second value yielded by *_map(), and so on.

Similarly within *_map(), each iteration in the for-of loop retrieves the next value from its iterator. That value is yielded after transforming it by mapperFunc. And its iterator is actually the generator returned by .filter() and *_filter(). So the first value in the for-of loop within *_map() is actually the first value yielded by *_filter(), and so on.

*_filter() uses a for-of loop to iterate over the values of its iterator and only yields a value if the predicate function returns true. Well that iterator is the generator object returned by generateStocks(). Each iteration of the for-of loop is pulling out values from the generator, which generateStocks() is yielding with a random price.

The reason why the program doesn’t crash even though generateStocks() will continue to yield random stocks as long as they are requested, is because *_take() quits yielding values after it has reached count number of values. Because no more yields happen, the chain reactions end and generateStocks() stops yielding random stocks.

Sweeet!

JavaScript engine support

According to the ECMAScript 6 Compatibility table, only Safari 9 doesn’t support generator functions. All other modern browsers and engines support them.

Additional resources

As always, you can check out the Learning ES6 examples page for the Learning ES6 Github repo where you will find all of the code used in this article running natively in the browser. You can also get some practice with ES6 classes using ES6 Katas.

For more on using generators as iterators feel free to read:

Coming up next…

So we just looked at how we can use generator functions to easily create generator objects that are iterators. But that’s only half the story! Generator objects not only can act as data producers (aka iterators), but they can also act as data consumers (aka observers). Up next, we’ll continue our deep dive into generators, looking at how we can use them to consume data. This is where the asynchronous magic really happens. I had initially planned to just do one big blog on generators that covered both sides, but it’s clearly too big for just one article. Even this blog post could’ve been split into two.

Until then…

FYI

This Learning ES6 series is actually a cross-posting of a series with the same name on my personal blog, benmvp.com. The content is pretty much the exact same except that this series will have additional information on how we are specifically leveraging ES6 here in Eventbrite Engineering when applicable. The generators as iterators blog post can be found here.

Learning ES6: Iterators & iterables

iterators-gonna-iterate

We’ve talked about promises and new collection APIs, so now we’re finally going to talk about iterators & iterables. They’ve come up in passing in the last couple of posts, so it’s about time we talk about them deeply.

TL;DR

Iterators provide a simple way to return a (potentially unbounded) sequence of values. The @@iterator symbol is used to define default iterators for objects, making them an iterable.

class MyIterator {
    constructor() {
        this.step = 0;
    }
    [Symbol.iterator]() {
        return this;
    }
    next() {
        this.step++;

        if (this.step === 1)
            return {value: 'Ben'};
        else if (this.step == 2)
            return {value: 'Ilegbodu'};

        return {done: true};
    }
}

let iter = new MyIterator();

// output: {value: 'Ben'}
console.log(iter.next());

// output: {value: 'Ilegbodu'}
console.log(iter.next());

// output: {done: true}
console.log(iter.next());

// output: {done: true}
console.log(iter.next());

The iteration & iterable protocol are based on the following duck-typed interfaces (explained in TypeScript for clarity):

interface Iterable {
    [System.iterator]() : Iterator;
}
interface Iterator {
    next() : IteratorResult;
    return?(value? : any) : IteratorResult;
}
interface IteratorResult {
    value : any;
    done : boolean;
}

All the collection types (Array, Map, Set, etc.) have default iterators designed for easy access to their contents. Strings also have a default iterator so it’s easy to iterate over the code point characters of the string (rather than the code unit characters).

let str = '';

for (let char of str) {
    console.log(char);
}

// output:
// 
// 
// 

Iterables are important to know because a lot of the APIs moving forward will accept iterables instead of just arrays for greater flexibility. Iterators are helpful to know because they serve as the basis for generators, which open new doors to asynchronous programming. Be sure to clone the Learning ES6 Github repo and take a look at the iterators & iterables code examples page showing off the features in greater detail.

Let’s get this party started.

Iterables

The reason the for-of loop can work on Array, Map, Set, String, arguments, etc. is because they are all iterables. An iterable is an object that intends to make its sequential elements publicly accessible through iteration interfaces. This object does so by implementing the default @@iterator method using the well-known Symbol.iterator symbol. We’ll talk about about ES6 symbols more in a future post.

The default @@iterator returns an object that implements the iterator “interface” (explained further below), which is the actual object that for-of and other iteration features use to iterate. This means that you can create your own objects that implement the iterable “interface” via duck-typing.

Here’s the iterable interface explained using TypeScript:

interface Iterable {
    // default iterator
    [System.iterator]() : Iterator;
}

You may already be familiar with iterables if you used C# IEnumerable or Java Iterable. We’ll explore the Iterator interface in the section on iterators below.

Why Symbol.iterator?

You can think of the default Symbol.iterator() method just like the default toString() method. toString() provides a custom way to serialize any object to a string. Symbol.iterator() provides a custom way to iterate over an object.

The TC-39 committee chose Symbol.iterator() for backwards compatibility. They could’ve chosen a friendlier name like iterator() or iter() to be more like toString(), but there was a good chance that there would be existing JavaScript code out in the wild using those method names. That code of course would be doing something different, so when it ran on an ES6 JavaScript engine, it would break. As we’ll learn in a future article, Symbols are new to ES6 as well and are guaranteed to be unique. Therefore there was no possibility of existing code having a naming conflict. The toString() method has existed in the language from the very beginning.

Using the default iterator

Ok, ok. Enough exposition. Let’s look at some real code to hopefully help make this more clear.

By now we should be quite familiar with how the for-of operator works:

let values = ['alpha', 'beta', 'charlie'];

for (let value of values) {
    console.log(value);
}

It iterates over the values array assigning each value to the value variable. Well, what for-of is doing is accessing the default Symbol.iterator() and iterating until the iterator says it is done (example adapted from Nicholas C. Zakas in Iterators and Generators):

let values = ['alpha', 'beta', 'charlie'];
let defaultIterator = values[Symbol.iterator]();

// output: {value: 'alpha', done: false}
console.log(defaultIterator.next());

// output: {value: 'beta', done: false}
console.log(defaultIterator.next());

// output: {value: 'charlie', done: false}
console.log(defaultIterator.next());

// output: {value: undefined, done: true}
console.log(defaultIterator.next());

We’ll go into this in more depth in the section on iterators, but the .next() method on an iterator object returns an object containing the value of that iteration and whether or not the iteration is done. When for-of receives {done: true} it stops iterating.

One cool application of the default Symbol.iterator() is to make the array-like jQuery object an iterable (thanks to Jason Orendorff in ES6 In Depth: Iterators and the for-of loop):

jQuery.prototype[Symbol.iterator] = Array.prototype[Symbol.iterator];

The jQuery object is already array-like so we give it the same default iterator that Array has. Now it can be used with for-of instead only relying on its .each() method.

Lastly, because all iterables implement the default Symbol.iterator() method, it makes it super easy to detect if an object is an iterable:

function isIterable(obj) {
    return obj && typeof obj[Symbol.iterator] === 'function';
}

// output: true
console.log(isIterable(['alpha', 'beta', 'charlie']));

// output: true
console.log(isIterable('Ben'));

// output: true
console.log(isIterable(new Set()));

At this point, if you wanted, you could stop reading. A solid understanding of iterables is all you really need. But if you want to get a slightly deeper understanding of iterators to prepare yourself for generators, please keep on reading!

Iterators

An iterator is a pointer for traversing the elements of a data structure. This type of object exists in most programming languages including C# (IEnumerator), Java (Iterator) or Python (iterator). Instead of having a full list, an iterator walks one by one through a sequence. And that sequence could be unbounded such that it never terminates.

Let’s look at a simple example (adapted from Axel Rauschmayer in Iterables and iterators):

class MyIterator {
    constructor() {
        this.step = 0;
    }
    next() {
        this.step++;

        if (this.step === 1)
            return {value: 'Ben'};
        else if (this.step == 2)
            return {value: 'Ilegbodu'};

        return {done: true};
    }
}

let myIter = new MyIterator();

// output: {value: 'Ben'}
console.log(myIter.next());

// output: {value: 'Ilegbodu'}
console.log(myIter.next());

// output: {done: true}
console.log(myIter.next());

// output: {done: true}
console.log(myIter.next());

On the surface, this doesn’t look too special. The MyIterator instance is doing exactly what the class is defining. When we call .next() the first time we get an object with value of 'Ben'. The next time the object contains value of 'Ilegbodu'. Every time after that we just get {done: true}.

But that’s all an iterator is. It’s an object that when .next() is called it returns a value or indicates that it is done. Now technically, the object returned by .next() should include both value and done. But done can be omitted when it is false and value can be omitted when it is undefined. Also, the iterator doesn’t have to be a class object as we’ve done with MyIterator. It can just be a plain JavaScript object that has a .next() method.

This iterator interface in ES6 is also a bit different than other languages that support iterators. In C#, IEnumerator has a MoveNext() to go to the next item in the sequence. It returns false when the sequence is over. There’s a separate Current property that contains the value. The Java Iterator has a next() method that returns the next value in the sequence and a hasNext() method that needs to be called to see if there are remaining items in the sequence. Python’s iterator has a next() method that also returns the next value in the sequence, but throws a StopIteration exception when there are no values remaining in the sequence.

Iterators + iterables

The “magic” of iterators comes to life when we want to use it in a construct that consumes iterables, such as the for-of loop. First we need an iterable object by creating an object with a default @@iterator:

let myIterableSequence = {
    [Symbol.iterator]() {
        return new MyIterator();
    }
};

In the example, the iterable (myIterableSequence) is just a plain JavaScript object instead of a class instance. It uses computed property keys added in ES6 to quickly define the default @@iterator. All it does is return a MyIterator instance.

Now check out what happens when we use myIterableSequence in a for-of loop:

// output:
// Ben
// Ilegbodu
for (let item of myIterableSequence) {
    console.log(item);
}

The for-of loop starts by calling the default @@iterator method on myIterableSequence. It then calls .next() to get each value which is subsequently assigned to the item variable. It will continue to call .next() on myIterableSequence until the iterator says its finished by returning {done: true} when .next() is called. The for-of loop is basically just a series of method calls on an iterator underneath. This is exactly what it gets transpiled down to in ES5 (with optimizations for handling arrays).

You know, instead of creating the wrapper object (myIterableSequence) to create an iterable, we could instead make the iterator itself iterable by implementing the default @@iterator and returning itself:

class MyIterator {
    constructor() {
        this.step = 0;
    }
    [Symbol.iterator]() {
        return this;
    }
    next() {
        this.step++;

        if (this.step === 1)
            return {value: 'Ben'};
        else if (this.step == 2)
            return {value: 'Ilegbodu'};

        return {done: true};
    }
}

let myIter = new MyIterator();

// output:
// Ben
// Ilegbodu
for (let item of myIter) {
    console.log(item);
}

Now our object iterator object can be used directly in constructs like for-of that only work with iterables. We’ll see more examples of this as we move forward.

Formal iteration protocol

The full iteration protocol is as follows (once again using TypeScript for clarity only):

interface Iterable {
    // default iterator
    [System.iterator]() : Iterator;
}
interface Iterator {
    // next method to continue iteration
    next() : IteratorResult;

    // optional return method
    return?(value? : any) : IteratorResult;
}
interface IteratorResult {
    value : any;
    done : boolean;
}

Technically the Interator interface also includes a throws() method, but it’s only used with generators (and yield*) and even then it’s optional. Chances are, you’ll never implement it yourself.

But an iterator can implement the optional return() method. It’s optional because it’s not always called. But when it is, it makes the iterator closable. A for-of loop will call return() if the loop exits because of a return, break or an exception. This is really a hook for the iterator to do any cleanups before it’s no longer used. It typically will return {done: true, value: x} where x is the last value returned by next(). If return() doesn’t return an object an error is thrown.

Lazy iterators

Because the only way to get values out of an iterator is one-by-one using next(), iterators can be lazy. They don’t have to generate their values until the next value is needed. This can open up a number of cool possibilities.

The first possibility is that we can now have sequences of values that can be a result of computationally expensive operations. Up until now, the only way to easily have a sequence was to have an array of all the values. This wouldn’t be feasible from a performance standpoint. Let’s take the jQuery object we alluded to earlier. When you get a jQuery object as a result of a selection (such as $('p')), it maintains an array of the matching DOM nodes. However, when you call .each() on the object, each item is a regular DOM node, presumably because it would be too expensive to wrap each matching node in its own jQuery object.

In our example before we assigned the jQuery object’s default @@iterator to be the default @@iterator for Array so that it can be used in a for-of loop. But this would still result in regular DOM nodes on each iteration. However, if the jQuery object implemented a custom default @@iterator, it could wrap each matching node in a jQuery object, but only when the next value is requested via .next(). It would create the jQuery objects on-demand. If you never loop over the matches, then the wrapped objects never need to be created.

// loop over all <ul> tags
for (let uList of $('ul')) {
    // `uList` is already a jQuery object.
    // No need to do `$(this)`
    // for each uList (now a jQuery object) loop through <li>
    for (let listItem of uList.find('li')) {
        // `listItem` is also a jQuery object
        console.log(listItem);
    }
}

Wouldn’t this be so nice? Because we use for-of instead of .each() we no longer have to deal with a callback function either. And because the DOM nodes are wrapped jQuery objects we no longer have to do that initial $(this) step. Kudos to Nicolas Bevacqua in ES6 Iterators in Depth for the idea.

Another possibility with lazy iterators is infinite sequences. This is an iterator that will never return {done: true} to signal that the sequence is over. Each call to .next() will always return a value. A perfect example of an infinite sequence is the Fibonacci sequence (borrowed from Luke Hoban):

let fibonacci = {
    [Symbol.iterator]() {
        let previous = 0, current = 1;
        return {
            next() {
                [previous, current] = [current, previous + current];
                return {value: cur};
            }
        }
    }
}

for (var number of fibonacci) {
    // stop after the number is greater than 1000
	if (number > 1000)
        break;

    console.log(number);
}

You see? The iterator never returns {done: true}, making it infinite. If we tried to create an array from this infinite sequence, it would crash our app. Therefore, if we have an infinite sequence in a for-of, we must return or break, otherwise the loop will never end.

Built-in iterators

As mentioned, for-of works with a lot of native objects because they have default @@iterator methods defined. Collections have additional iterator methods: .entries(), .values() and .keys(). Check out the article on the new collections added in ES6 for more details.

Other consumers of iterators

The for-of operator isn’t the only construct that makes use of iterators.

Array.from

ES6 added the Array.from() static method that converts any iterable or array-like object into an actual array:

let array = Array.from(iterable);

Because Array.from() creates an array from the iterable, the iterable cannot be infinite; you cannot have an infinite array.

Spread operator

As we learned in the parameter handling article, the spread operator can be used to insert values of an iterable into an array:

let array = ['a', ...iterable, 'z'];

You can also use the spread operator to mimic Array.from():

let array = [...iterable];

Lastly, you can turn an iterable into individual arguments of a function call:

foo(...iterable);

So if iterable was an iterator, it would just call .next() until it received {done: true}. Each one of the values would end up being parameters in the function call! But once again, infinite iterables will not work with the spread operator because it reads until it receives {done: true} which will never be returned with an infinite iterator.

Array destructuring

Destructuring actually allows us to pull out values from any iterable. When we initially learned about destructuring, we only focused on arrays. Imagine we had our fibonacci iterable example from earlier. It’s neither an array nor finite, yet it can be a part of destructuring:

let [, secondFib, , fourthFib] = fibonacci;

// output: 2, 5
console.log(secondFib, fourthFib);

The code is simply extracting the 2nd and 4th Fibonacci numbers from the fibonacci iterable. But what’s happening is that destructuring is calling .next() on the iterable only four times. That’s how destructuring can work with infinite iterables. The first call to .next() returns the first Fibonacci number, but we aren’t actually consuming it into a variable. It’s the second number we want, so it calls .next() again, retrieves the value and assigns it to secondFib. The third number returned by the third call to .next() isn’t consumed, and then finally the fourth call to .next() assigns the value to fourthFib.

Destructuring and lazy iterators work very well together.

Map & Set constructor

The Map constructor converts an iterable of [key, value] pairs into a Map object:

let map = new Map(iterable);

The Set constructor converts an iterable of values into a Set object:

let set = new Set(iterable);

And as we learned, because the Map & Set objects are themselves iterables, we can use their constructors to clone them. No infinite iterators allowed here either.

Promise.all & Promise.race

Promise.all() and Promise.race() both accept iterables of Promise objects (technically thenables), and not just arrays. So if you had a Set of thenables you could pass it directly to either of those static methods without having to do any array conversions. You could in theory use an infinite iterable with Promise.race() since it stops once one of the promises are fulfilled, but since it’s inherently asynchronous it may try to read the whole iterable prior to getting back the first asynchronous result.

yield*

We haven’t talked about yield* yet because they are used with generators. We’ll deep dive into those in the next post.

Combinators

Combinators are functions that manipulate iterables to create new ones. If you’re familiar with LINQ or RxJs, you’ve dealt with combinators before. Let’s create our own combinator to see how they work. Let’s define a take(iterable, count) combinator function that returns a new iterable over the first count items of iterable (adapted from an example by Axel Rauschmayer in Iterables and iterators):

function take(iterable, count) {
    // get default `@@iterator` from original iterable
    let iterator = iterable[Symbol.iterator]();

    // return new (anonymous) iterable
    return {
        next() {
            // implementing `next()` makes it an iterator

            if (count > 0) {
                // if there are items remaining, return the next
                // one from the iterable
                count--;

				// return the value from original iterable's iterator.
				// if there are less values in it than `count`, this
				// will just return `{done: true}` early!
                return iterator.next();
            }
            else {
                // otherwise just say we're done
                return {done: true};
            }
        },
        [Symbol.iterator]() {
            // implementing default `@@iterator` makes it an iterable
            return this;
        }
    };
}

// output: [1, 2, 3, 5, 8, 13]
console.log(Array.from(take(fibonacci, 6)));

We were able to create an array of the first 6 Fibonacci numbers by using the take() combinator function. We take the infinite iterable fibonacci and pass it to take() which really just returns a new iterable. Nothing else has happened yet. We basically have a new iterable that has “instructions” to get the first 6 items from the fibonacci iterable, but it hasn’t done it yet because it’s lazy and hasn’t been instructed to do so.

Array.from() consumes this new iterable and runs it to completion. Unlike fibonacci, this new iterable is finite and just returns 6 values; the first 6 items from fibonacci. After which it says it’s done, and Array.from() returns an array with six elements.

ES6 doesn’t have a native iterable object that’s like C# IEnumerable or Java Iterable. This type of object would be a special type of iterable that would have combinators as methods that would return new iterables. These may come in future version of ECMAScript. The goal with ECMAScript 6 was to standardize the iteration protocol, and then survey the landscape for what sort of libraries pop up based on the protocol. The most useful stuff could then get folded in for native support.

JavaScript seems to be moving more towards functional programming over object-oriented programming. So it’s possible that instead of having a native iterable object with combinators methods like C# & Java have, there may be a native set of modules with a bunch of combinator functions (like take() above) that can be used with iterables.

Generators

We’ve only really just scratched the surface of iterators. Like I mentioned earlier, we haven’t even talked about implementing the return() or throw() methods with iterators. The main reason is that it’s unlikely that we will be implementing iterators manually in practice. Writing iterators so that they adhere to the correct behavior is a bit difficult, which is why ES6 also provides generators. Instead of implementing an iterator object from scratch, we’ll most likely use generator functions that create a generator object that is a special type of iterator object that unlocks a host of additional functionality.

JavaScript engine support

According to the ECMAScript 6 Compatibility table, all modern browsers support iterators and iterables. That shouldn’t really come as a surprise since we’ve already learned that they all support the for-of loop and the new collections.

Additional resources

As always, you can check out the Learning ES6 examples page for the Learning ES6 Github repo where you will find all of the code used in this article running natively in the browser. You can also get some practice with ES6 classes using ES6 Katas.

This post only covered the parts of iterators I considered the most useful to know. In my opinion, iterables were the key learning from this post, but you have to know something about iterators for iterables to make sense. However, if you really want to know all of the ins and outs of the iteration protocol, there are some additional resources you can read:

Coming up next…

All this learning about promises, the for-of loop, and today’s article on iterators & iterables are all setting the stage for discussion on generators, the next generation of asynchronous programming. The yield keyword helps create generators and was something I saw in C# years ago and always want to learn about. Now it’s in JavaScript! Until then…

Learning ES6: New Collections

collections

Let’s continue focusing on the new functionality introduced with ES6 in the Learning ES6 series. The main focus in the next few articles will be all about asynchronous programming. We’ll ultimately talk about generators, but there are a few building blocks we need to get through first. The new collections we’ll talk about now aren’t really building blocks for generators, but I feel that they are important to learn. In addition, they are types of iterables which we’ll deep dive into in the next article.

TL;DR

ES6 introduces four new efficient collection data structures to mitigate our ab-use of object and array literals.

A Set contains a unique set of values of any type:

let set = new Set([true, 'Ben', 5]);

set.add(false).add('Ilegbodu').add(25).add(true);

// output: 6
console.log(set.size);

// output: true
console.log(set.has('Ben'));

Map provides a mapping of keys of any type to values of any type:

let map = new Map();

map.set('foo', 'bar');
map.set(true, 'Ben'); // non-strings can be keys

// output: Ben
console.log(map.get(true));

// output: 2
console.log(map.size);

WeakMap provides memory leak-free lookup of objects to values of any type:

let $leftButton = $('#leftButton');
let domMetadata = new WeakMap();

domMetadata.set($leftButton, {clickCount:0});

WeakSet provides memory leak-free collection of unique objects:

let $leftButton = $('#leftButton');
let clickedDomNodes = new WeakSet();

clickedDomNodes.add($leftButton);

The differences between the 4 collection types are subtle but important. Be sure to clone the Learning ES6 Github repo and take a look at the new collections code examples page showing off the features in greater detail.

To learn how to use these collections you could just read documentation because they are just new APIs. However, to know why you would want to use each one, I suggest you keep reading.

Map

You may be thinking. Why do I need to use Map when I can just use a regular ol’ object? At first glance it certainly does look just like an object literal. But up until now we have been abusing JavaScript objects as maps. They were intended for holding loosely, abtitrarily-nested structured data much like XML. But when there’s only one level indexed by string keys, they basically look like maps or lookup tables.

ES6 now introduces a true map data structure appropriately called Map.

Constructor

The Map constructor takes an optional array of [key, value] pairs that are added when the Map is created. If you omit the array, an empty Map object is created.

let allStarVotesEmpty = new Map();

let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');

let allStarVotesInitialized = new Map([
    [steph, 50],
    [kobe, 0],
    [lebron, 22]
]);

Right now, there doesn’t seem to be much difference between an object literal and a Map. In fact the Map seems like more syntax. But that’s about the change…

Handling values

One limitation of using object literals as maps that you may have run into is that object literals only support using strings as keys. Any key you set on an object literal that is not a string will get coerced into one.

let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');

// Build up votes lookup table using
// ES6 computed property keys
let allStarVotes = {
    [steph]: 50,
    [kobe]: 0,
    [lebron]: 22
};

// output: true
// the player objects are coerced to the
// strings "[Object object]"
console.log('[Object object]' in allStarVotes);

Another issue is the looseness in determining if a key is a part of a JavaScript object. Doing a truthy check doesn’t work if its value can be falsy:

let allStarVotes = {
    'Stephen Curry': 50,
    'Kobe Bryant': 0,
    'LeBron James': 22
};

// truthy check doesn't work because 0 is a
// valid value and is falsy
if (allStarVotes['Kobe Bryant']) {
    console.log('Kobe Bryant is a candidate');
}

Getting the size of the object is also not very straightforward nor efficient. The quickest way is to get the length from the array of keys.

let allStarVotes = {
    'Stephen Curry': 50,
    'Kobe Bryant': 0,
    'LeBron James': 22
};
let numCandidates = Object.keys(allStarVotes).length;

Using vanilla JavaScript objects is also susceptible to a security issue because you could unintentionally overwrite properties inherited from Object.prototype (such as toString):

let allStarVotes = {
    'Stephen Curry': 50,
    'Kobe Bryant': 0,
    'LeBron James': 22
};
allStarVotes.toString = 'overwritten';

// Error!
// toString is not a function
console.log(allStarVotes.toString());

Map clears up all of these issues.

  • Map.prototype.get(key) retrieves the value mapped to key. This replaces indexing into a vanilla JavaScript object using dot- or bracket-notation. It returns undefined if the key is not present. There’s no way of providing a default unfortunately.
  • Map.prototype.set(key, value) maps the specified key to the specified value. This will overwrite any existing value for the key or create a new one. This replaces assigning to a vanilla JavaScript object using dot- or bracket-notation. It returns a reference to the instance, so Map.prototype.set is chainable.
  • Map.prototype.has(key) checks for the existence of the specified key in the map, solving the existence issues described above with vanilla JavaScript objects
  • Map.prototype.delete(key) removes the value mapped to the specified key, returning true if the value was removed and false otherwise. This replaces using the delete keyword with either dot- or bracket-notation.
  • Map.prototype.clear() removes all entries from the map. With vanilla JavaScript objects we would do map = {}, but that just set map to a new empty object as opposed to clearing it out.
  • Map.prototoype.size efficiently returns the number of entries in the map.
let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');
let allStarVotes = new Map();

allStarVotes.set(steph, 50)
    .set(kobe, 0)
    .set(lebron, 22);

// output: 50
console.log(allStarVotes.get(steph));

// output: false
console.log(allStarVotes.has('Kevin Durant'));

allStarVotes.delete(kobe);

// output: 2
console.log(allStarVotes.size);

allStarVotes.clear();

// output: 2
console.log(allStarVotes.size);

Iterating

Map provides three methods that return iterators over its data (in insertion order):

  • Map.prototype.keys() returns an iterator over just the keys of the map
  • Map.prototype.values() returns an iterator over just the values of the map
  • Map.prototype.entries() returns an iterator over [key, value] pairs of the map

We haven’t actually talked about iterators and how they work yet (that’s coming up in the next post), but the following code should be pretty self-explanatory:

// log each player name since player
// is a key in the map
allStarVotes.keys().forEach((player) => {
    console.log(player.name);
});

// log each all star vote count since
// count is a value in the map
allStarVotes.values().forEach((count) => {
    console.log(count);
});

// log each player name and his votes count
// together. Ex: 'Stephen Curry (50)
// Uses array destructuring to assign [key, value]
// pair into separate variables
allStarVotes.entries().forEach(([player, count]) => {
    console.log(`${player.name} (${count})`);
});

We learned earlier that the Map constructor accepts an array of [key, value] pairs. That’s only a part of the story. It actually accepts any iterable of [key, value] pairs. This means that we can quickly clone a Map object by passing its Map.prototype.entries() iterator to the constructor of a new Map (because we just showed it returns an array of [key, value]):

let allStarVotesCopy = new Map(allstarVotes.entries());

But actually, it gets even better. Map objects have what’s called a default iterator. And that default iterator is Map.prototype.entries(). This means we can clone a Map object by simply passing it into the constructor:

let allStarVotesCopy = new Map(allstarVotes);

The for-of operator also works with a default iterator so we don’t need to call .entries() when we loop:

// log each player name and his votes count
// together. Ex: 'Stephen Curry (50)
// Uses array destructuring to assign [key, value]
// pair into separate variables
for (let [player, count] of allStarVotes) {
    console.log(`${player.name} (${count})`);
}

And since the constructor takes any iterable, we can also easily merge raw data with a Map object to create a new object by using the spread operator:

let durant = new Player('Kevin Durant');
let cp3 = new Player('Chris Paul');
let theBrow = new Player('Anthony Davis');

let russell = new Player('Russell Westbrook');
let carmelo = new Player('Carmelo Anthony');

let moreAllStarVotes = new Map([
    [durant, 20],
    [cp3, 5],
    [theBrow, 10]
]);
let rawData = [
    [russell, 12],
    [carmelo, 15]
];

let mergedMap = new Map([...allStarVotes, ...moreAllStarVotes, ...rawData]);

The spread operator works with any iterable not just arrays, so we can use it to create a concatenated array literal, which we then pass into the Map constructor. The amount of code or helper libraries it would take to do this in ES5 with vanilla JavaScript objects would be immense.

And don’t worry if you don’t fully understand all of this iterator business. We’ll cover iterators and iterables in great detail in the next post. Hopefully having this background on collections will make that discussion make even more sense.

Map also exposes Map.prototype.forEach(loopFunc) which is similar to Array.prototype.forEach(loopFnc) (introduced in ES5) and a functional approach to using Map.prototype.entries():

allStarVotes.forEach((count, player, map) => {
    console.log(`${player.name} (${count})`);
});

The third parameter passed to the function by Map.prototype.forEach() (map in the above example) is a reference back to the map object. In the case of the arrow function we used above, it wouldn’t be necessary because allStartVotes is still in scope. But if we were instead passing a named function, having that third map reference parameter could come in handy.

Although Map does have forEach, it doesn’t have filter or map. You will first have to convert the Map object to an array of [key, value] pairs (using the spread operator like [...allStarVotes]), do the filter/map operation, and then construct a new Map object from the result. Hopefully this functionality will be added in the future.

Map vs Object

So now that we’ve learned about all that Map can do, should we replace all uses of vanilla JavaScript objects with Map? Well, not exactly. If you’re wanting to map anything other than strings to data values, you have no choice; you need to use Map.

However, if you’re mapping string keys to data values, you have options. A good rough guideline deals with the types of keys in your map. If you’re keys are fixed/static then just use a vanilla JavaScript object. It’s really simple to do map.keyName. If your keys are dynamic (you’re indexing into the map using variables), then use a Map: map.get(varString).

WeakMap

You’ve probably heard of a map before, but what’s a weak map? A WeakMap is a subset of a Map. You can call it a “Map with restrictions.” It’s not iterable, so it doesn’t have a .size property nor .clear(), .entries(), .keys(), .values() or .forEach() methods. All keys must be objects; no strings, numbers, booleans or symbols (more on these later) allowed.

let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');
let allStarVotesWeak = new WeakMap();

allStarVotesWeak.set(steph, 50)
    .set(kobe, 0)
    .set(lebron, 22);

// output: 50
console.log(allStarVotesWeak.get(steph));

// output: false
console.log(allStarVotesWeak.has('Kevin Durant'));

allStarVotesWeak.delete(kobe);

It’s probably not immediately apparent why you would use a WeakMap over a normal Map when it’s so restrictive. And what’s up with it only supporting objects as keys? Well Nicolas Bevacqua explains it well in ES6 WeakMaps, Sets, and WeakSets in Depth:

The difference that may make WeakMap worth it, is in its name. WeakMap holds references to its keys weakly, meaning that if there are no other references to one of its keys, the object is subject to garbage collection.

When you use an object as a key in a Map object. Those object keys will never get garbage collected as long as the Map object is around because the Map object still has references to them. This can cause memory leaks if nothing else has references to these object keys. However, if the object keys in a WeakMap have no other references to them, those objects will be removed from the WeakMap object and available for garbage collection. This prevents the chance of a memory leak.

One use case of a WeakMap is if you want to attach some metadata to DOM objects. Lets say you want to keep track of how many times <p> nodes on the page have been clicked:

// set up metadata click map
let clickMap = new WeakMap();

// on each click, add the p to the map
// (with initial click) or increment its
// click count
$('p').click(function() {
    let pNode = this;
    let clicks = clickMap.get(pNode);

    if (!clicks) {
        clicks = 0;
    }

    clicks.set(pNode, ++clicks);
});

The reason using a WeakMap is advantageous in this example is that if a given <p> node gets removed from the DOM, we don’t have to know when that happens in order to delete it from our map so that the node can be garbage collected. If the node is removed from the DOM and nothing else has a reference to it, then it will automagically be removed from our WeakMap because it held a reference to the node weakly.

Set

A set can also be thought of as a subset of a map as well. You could think of it as a map where the keys don’t matter, but the values still need to remain distinct/unique. In fact, because ES5 didn’t have an explicit set data structure, the best workaround has been to use a vanilla JavaScript object making the elements of our “set” the keys of the object. The values in the object would be some truthy value (like 1) to make existence testing easier, but the values themselves didn’t really matter.

let nbaPlayers = {
    'Stephen Curry': true,
    'Kobe Bryant': true,
    'LeBron James': true
};

if (nbaPlayers['Stephen Curry']) { // true
    console.log('Stephen Curry is an NBA player');
}
if (nbaPlayers['Ben Ilegbodu']) { // false :'(
    console.log('Ben Ilegbodu is an NBA player');
}

This works pretty well, but it has the same drawback as vanilla JavaScript objects as maps. The keys have to be strings. ES6 includes the Set data structure that will work with any values (not just strings).

Constructor

The constructor takes an optional iterable of values that can be added initially to the Set object. If you choose to omit the iterable, then an empty Set object is created.

let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');

let initializedSet = new Set([steph, kobe, lebron]);
let emptySet = new Set();

Handling values

Set shares many of the same properties/methods as Map too:

  • Set.prototype.size
  • Set.prototype.has(value), which is a lot faster than using Array.prototype.indexOf()
  • Set.prototype.delete(value)
  • Set.prototype.clear()

There is no Set.prototype.get() because there are no keys. And Map.prototype.set() is replaced by Set.prototype.add(value) for adding elements to the set. But just like Map.prototype.set(), Set.prototype.add() returns a reference to the instance so it’s chainable.

let steph = new Player('Stephen Curry');
let kobe = new Player('Kobe Bryant');
let lebron = new Player('LeBron James');

let allStars = new Set();

allStars.add(steph)
    .add(kobe)
    .add(steph) // duplicates are removed
    .add(lebron);

// output: false
console.log(allStars.has('Kevin Durant'));

// output: true
console.log(allStars.has(kobe));

allStars.delete(kobe);

// output: 2
console.log(allStars.size);

allStars.clear();

// output: 2
console.log(allStars.size);

A couple of methods I wish Set came with:

  • Set.prototype.addAll(iterable) to add a list of items to the set in one call instead of having to iterate over a list and call Set.prototype.add() on each iteration
  • Set.prototype.hasAll(iterable) to check to see if every item in the list is in the set
  • Set.prototype.deleteAll(iterable) to delete every item in the list from the set

Maybe they will get added in future specifications.

Iterating

Set has the same 3 methods that return iterators that Map has:

  • Set.prototype.values() returns an iterator over just the values of the set
  • Set.prototype.keys() returns the same iterator as Set.prototype.values() since Set only has values. It exists for parity with Map
  • Set.prototype.entries() returns an iterator over [key, value] pairs of the set where key and value are the same values. It too exists for parity with Map

The default iterator for Set is Set.prototype.values() so you can easily iterate over a Set object with for-of or Set.prototype.forEach():

for (let allStar in allStars) {
    console.log(allStar.name);
}

allStars.forEach((value, key, setRef) => {
    console.log(value.name);

    // In a set the value & key are the same
    console.log(value === key);

    // The third parameter is a reference to the
    // instance
    console.log(setRef === allStars);
});

It also means that you can easily clone a Set object since the constructor accepts an iterable:

let allStarsClone = new Set(allStars);

Lastly, you can combine Set’s de-duping nature with the spread operator to create a de-dupe array helper:

function dedupe(array) {
    return [...new Set(array)];
}

let noDupesArray = dedupe([1, 2, 1, 4, 7, 3, 1]);

// output: [1, 2, 4, 7, 3]
console.log(noDupesArray);

Set operations

Common non-mutating operations performed on sets are union, intersection, and difference. They’re non-mutating because they don’t change the underlying set but return a new object containing the items. Unfortunately, the Set object in ECMAScript 6 does not have any of those methods. However, with some other ES6 functionality we can implement them ourselves.

Union implementation

Union (a ∪ b in set notation) is a new Set object that contains both the elements of set a and set b. This is pretty straightforward to implement in ES6. Using the spread operator, we can create an array which is the concatenation of both sets and then create a new Set object from that (which will also de-dupe):

function union(setA, setB) {
    return new Set([...setA, ...setB]);
}

let setUnion = union(
    new Set(['a', 'b', 'c', 'd']),
    new Set(['d', 'e', 'f', 'g'])
);

// output: 8
console.log(setUnion.size);

Intersection implementation

Intersection (a ∩ b in set notation) is a new Set object that contains the elements that exist in both set a and set b. The implementation is a bit more involved, but not too bad. Essentially, we need to include the elements of a in the new set if they exist in b:

function intersection(setA, setB) {
    return new Set([...setA].filter(item => setB.has(item)));
}

let setIntersection = intersection(
    new Set(['a', 'b', 'c', 'd']),
    new Set(['d', 'e', 'f', 'g'])
);

// output: 1
console.log(setIntersection.size);

So what we do is first convert setA into an array (using the spread operator) so we can leverage Array.prototype.filter(). Then once we’ve filtered out all the items in setB that aren’t in setA we have have an array of the intersection, which we convert into a Set object.

Difference implementation

Difference (a \ b) is a new Set object that contains the elements in a that are not in b. Its implementation is similar to intersection except we want the ones in a that are not in b:

function difference(setA, setB) {
    return new Set([...setA].filter(item => !setB.has(item)));
}

let setDifference = difference(
    new Set(['a', 'b', 'c', 'd']),
    new Set(['d', 'e', 'f', 'g'])
);

// output: 3
console.log(setDifference.size);

WeakSet

A WeakSet is basically the combination of a Set and a WeakMap. Just like a Set it only contains unique values. And just like a WeakMap its not iterable, the values must be objects, and those values are available for garbage collection.

The use case for a WeakSet is similar to that of a WeakMap, except the data you wanted to store in the object is a simple boolean. Essentially the presence of the object in the set is all the information you need.

Let’s take our WeakMap example from before, but instead of keeping track of how many times a <p> tag had been clicked, we just want to know that it had been clicked:

// set up set of clicked nodes
let clickedNodes = new WeakSet();

// on each click, add the p to the set
$('p').click(function() {
    let pNode = this;

    clickedNodes.add(pNode);
});

And because the WeakSet holds onto its references weakly, if a DOM node is removed from the DOM or otherwise has no other references to it, it’ll also be removed from the WeakSet object.

Inheriting from collections

So let’s wrap up our discussion and talk about creating derived classes from these new collection objects. We learned in the article on classes that native classes can now be derived in ES6.

We may want to derive from Map to add the following functionality:

  • Map.prototype.get(key, defaultValue) to retrieve a default value if the value for the specified key didn’t exist (or was undefined)
  • Map.prototype.filter(testFunc) so that we don’t have to convert into an intermediary array to create a new filtered Map
  • Map.prototype.map(mapFunc) so that we don’t have to convert into an intermediary array to create a new Map with mapped values
  • Map.prototype.clone(iterable) as an alternative to passing a Map object to the Map constructor
  • Map.convert(vanillaObj) to go from an ES5-style map to an ES6 Map

We may want to derive from Set to add the following functionality:

  • Set.prototype.addAll(iterable) to add a list of items to the set in one call instead of having to iterate over a list and call Set.prototype.add() on each iteration
  • Set.prototype.hasAll(iterable) to check to see if every item in the list is in the set
  • Set.prototype.deleteAll(iterable) to delete every item in the list from the set
  • Set.prototype.some(testFunc) to return true if testFunc returns true for one of the items in the set
  • Set.prototype.every(testFunc) to return true if testFunc returns true for all of the items in the set
  • Set.prototype.union(otherSet) to union the set with another set
  • Set.prototype.intersection(otherSet) to intersect the set with another set
  • Set.prototype.difference(otherSet) to get the difference between the set and another set

JavaScript engine support

Because these 4 new collections aren’t new syntax, but new APIs, polyfills are needed to provide functionality for Map, Set, WeakMap & WeakSet. Babel & TypeScript partner with the core-js library to provide the polyfills for the 4 collections. Traceur itself has polyfills for Map & Set but not for WeakMap or WeakSet.

All of the modern browsers and servers support all 4 collections, so this is the rare case where there is more functionality in the browsers than in the transpiler.

Additional resources

As always, you can check out the Learning ES6 examples page for the Learning ES6 Github repo where you will find all of the code used in this article running natively in the browser. You can also get some practice with ES6 collections using ES6 Katas.

There is also lots of great reading to deep dive into these ES6 collections:

Coming up next….

After much build up, we’re finally going to talk about iterators and iterables. By now you should have a pretty high-level understanding of how they work, but we’ll deep dive into them so we can have a full understanding. They’ll also provide a nice groundwork for generators to follow afterwards. Until then…

FYI

This Learning ES6 series is actually a cross-posting of a series with the same name on my personal blog, benmvp.com. The content is pretty much the exact same except that this series will have additional information on how we are specifically leveraging ES6 here in Eventbrite Engineering when applicable.

Here at Eventbrite, we aren’t using these new collections yet. Although they have all the benefits listed above, we’ve chosen to continue to use the ES5 workarounds in order to not have add additional weight to the page using the polyfills. core-js is a pretty huge library and we’re very cautious when it comes to increasing the size of every page. The promises blog post can be found here.

The Lifecycle of an Eventbrite Webhook

At Eventbrite, we have a feature called webhooks.  Webhooks can be thought of as the opposite of an API call.  When using our API, developers either ask us for information, or hand us information.  Both of these are initiated by you.  In a webhook, we proactively notify developers (via an HTTP POST with JSON content) when actions happen on our site.  The actions we currently support are as follows:

  • Attendee data is updated
  • An attendee is checked in via barcode scan
  • And attendee is checked out via barcode scan
  • An event is created
  • And event is published
  • An event is unpublished
  • Event data is updated
  • Venue data is updated
  • Organizer data is updated
  • An order is placed
  • An order is refunded
  • Order data is updated

Webhooks are relatively simple to create.  You can create/delete them in our admin web interface.

Screenshot 2016-07-28 10.22.49

Screenshot 2016-07-28 10.25.46

You can also create/delete them by using the API.

import requests
import json
from pprint import pprint

#This sample creates and then immediately deletes a webhook
def create_webhook():
    response = requests.post("https://www.eventbriteapi.com/v3/webhooks/",
    headers = {
        "Authorization": "Bearer YOURPERSONALOAUTHTOKEN",
    },
    data = {
        "endpoint_url": "http://www.malina.io/webhook",
        "actions" : "",
        "event_id": "26081133372",
    },
    verify = True # Verify SSL certificate
    )

pprint (response.json())
return (response.json()[u'id'])

def delete_webhook(hook_id):
    response = requests.delete(
    "https://www.eventbriteapi.com/v3/webhooks/" + hook_id + "/",
    headers = {
        "Authorization": "Bearer YOURPERSONALOAUTHTOKEN",
    },

    verify = True # Verify SSL certificate
    )

pprint (response.json())

if __name__ == '__main__':
hook_id = create_webhook()
delete_webhook(hook_id)

When various actions occur within our system, there is a pipeline of infrastructure through which these actions flow in order to finally result in an HTTP post to a webhook URL.  In this post, I’ll describe that pipeline in detail. 

 

Step 1
Some action happens in Eventbrite.  Someone creates an event, or updates one. Someone buys a ticket, etc.  This could happen on eventbrite.com or through one of our mobile apps, or through our API.

 

Step 2
Dilithium detects a change in Eventbrite’s database. Let’s take the example of someone updating an event.  You might think that we have a place in the code where all updates of events happen, and that place in the code is also responsible for publishing to Kafka.  However, it turns out that it’s not that simple.  Due to event access happening in multiple parts of our codebase, and also due to our desire to *never* miss an action, we watch for them in the source of truth:  our database.  We do this via a piece of technology we call Dilithium.

Dilithium is an internal service that directly watches the replication logs of one of our databases.  When it sees “interesting” SQL statements (an insert of an event, an update of an event, etc.) it packages the relevant data (what happened, the ID of the objects, etc.) as JSON and sends it to Kafka.

 

Step 3
Kafka receives a message from Dilithium.  Kafka is a messaging system that has become fairly widely used, find out more at Kafka.org. 
For our purposes it can be thought of as a “pub-sub” pipeline.  Messages get published to it, and a number of interested consumers subscribe to these messages so they are notified when they happen. Kafka is a good choice for the webhooks pipeline because the actions that cause webhooks to fire are also relevant to other systems at Eventbrite:  maybe we need to set/invalidate a cache, or update our data warehouse, etc.

 

Step 4
The Webhook service receives an action from Kafka.  You’ll notice that nowhere in the pipeline up to now do we look at the events and try to match them to an actual webhook.  As a result, the webhook service receives many events (the vast majority of them) for which there is not a webhook registered.

The webhook service, which is part of our django application, starts by seeing if there is a webhook for any message it receives.  It simply uses the same database we discussed before (with some caching provided by memcache) in order to do this.  When it actually finds a webhook, it creates a JSON payload and is ready to actually make the HTTP request to the 3rd party developer.

 

The payload is JSON sent in an HTTP POST

{
"api_url": "https://www.eventbriteapi.com/v3/events/26081133372/",
"config": {
    "action": "event.published",
    "endpoint_url": "http://www.malina.io/webhook",
    "user_id": "163054428874",
    "webhook_id": "147601"
    }
}

Let’s take a closer look at what this payload object is made of.  

The ‘api_url’ can be thought of as the address of the data that caused the webhook to fire.  You could take url, append a personal OAuth token, plug it into your browser and view that data on Eventbrite’s API explorer.

The ‘action’ represents the change that we saw in the database. In the case of the example above, the event table was changed on the publish column. All possible actions can be found in the bulleted list at the beginning of this post. Each of those represent a change in the database.

The ‘endpoint_url is the value provided by the developer who registered the webhook and it is the address to which we send this payload.

The ‘user_id’ is the Eventbrite user id of the user who created the webhook.

The ‘webhook_id’ is the unique id that is assigned to this webhook.

 

Step 5
The final step is sending the actual HTTP request.  As you can imagine, this can be (and usually is) the slowest part of the pipeline by far.  These URLs are not ours, and we know nothing about them.  Maybe they will timeout, maybe they will take 20 seconds to respond, maybe they will throw a 500 error and we will want to retry.  Due to all these concerns, performing the actual HTTP request from the webhooks service is not feasible.  We really need to do them asynchronously.  For that we use a common async framework called Celery
.  We won’t talk in too much detail about Celery, but in brief, Celery implements a task queueing system that makes it very easy to take a section of code and run it asynchronously.  You simply provide celery with a queueing mechanism (RabbitMQ or SQS for example) and it takes care of the rest.  It’s this easy:

from proj.celery import app
import requests</code>

@app.task
def http_post_request(url, payload):
    response = requests.post(
    url,
    data = payload

    verify = True # Verify SSL certificate
    )

_log_response_to_database(response)

>>> http_post_request.delay(some_url, some_payload)

The celery workers make HTTP requests and store information about the request/response in the webhooks database.  We do this so we have a complete record of all the external communications related to webhooks.

 

Step 6
Sometimes the webhook URL fails.  In these cases we try again.  Since we store all requests/responses in the database, it is easy to determine if we see a failure for a particular webhook request (and how many times it has fails.) To implement our retry policy, we have a cron job that, every 10 minutes will retry failed requests, up to 10 times.  It is written as a django management command, and generally uses the same code to queue requests to django.

 

Lastly, let’s take a look at some of the things that are made possible by our webhook system. Zapier is one of our Spectrum Partners and is the largest consumer of our webhooks system. Zapier alone has tens of thousands webhooks registered which allow tens of thousands of Eventbrite organizers to automate sending their data to any combination of a selection of over 500 CRM applications.

 

Eventbrite and SEO: How does Google find our pages?

One thing that took me by surprise when I started researching SEO was that when a user enters a search term, the results are gathered from Google’s representation of the web not the entire web. For a page to be included in its index, Google must have already parsed, and stored the page’s contents in its databases.

To do this, automated robots known as spiders or crawlers scan the internet for links leading to pages they can index. These crawlers will begin scanning one page, then follow the links they find to then scan and index those pages.

webCrawlers

This pattern repeats until the search engine has indexed a sizable representation of the web. It stores the meta information and text it finds on each page in their own databases and it is this data they use to generate the search engine ranking pages displayed to users.

Having a website online will not guarantee Google will find your site and include all pages in its rankings. It needs to either find each page through outbound and inbound links, the website’s own sitemap or through manual submission to Google. Eventbrite relies on a mixture of these strategies to make sure our pages are included in Google’s index of the web.

Inbound Links

Inbound links are links from other domains that point to your website. Once Google crawlers land on a page, they quickly parse its content including any links that do not specifically tell search engines to ignore them. If website A includes a link to website B Google will follow the link to Website B after it is done parsing website A. The more external sites that link to your site the better chance Google has of indexing your pages.

Inbound links also play a large part in increasing a site’s relevancy and authority. Google’s main aim is to treat each web page as a user would. Therefore they deem pages that have a lot of natural outbound links as popular and increase their ranking in relevant search results. These links must occur naturally though as Google is known to decrease a page’s rank or remove them from their index entirely if the majority of their inbound links are from low authority or irrelevant pages.

Sausalito Art Festival Website with link to event page on Eventbrite

down arrow

Sausalito Art Event Page Source

Sausalito Arts Festival site links to Eventbrite

Links to our event pages are often included on our organizer’s own sites which are indexed by Google. We also rely on press releases, news articles and blogs to link to these event pages when covering the event. The more links we are able to accrue from outside sources the higher our authority score is. This boosts all Eventbrite pages as Google deems the site trustworthy and popular based on the pages linking to our site.

Outbound Links

Once Google has landed on an Eventbrite page, we use internal linking to direct crawlers to other pages we want indexed by Google. We utilize our most popular pages to point to other internal pages we want both users and Google to find. Our homepage is a popular entry point for users therefore Google views any internal links found on the page as important for parsing and indexing. We take advantage of this by listing popular events and links to our category search pages.
pop-events-homepage

top-categories-homepage

We also take a lot of care curating links within our footer as they are shown on each page of our site and is a good indicator to Google that these links are important. Some of our links are dynamic within the footer depending on the top-level domain (TLD) visited. A user visiting eventbrite.com will see links to American cities in our footer whereas users visiting eventbrite.com.au will see Australian cities.

Eventbrite Footer - US tld

Eventbrite Footer – US TLD

Eventbrite Footer - Australia tld

Eventbrite Footer – Australia TLD

We also use breadcrumbs on our public event pages to link to city and category directory pages. Not only does it provide another place for Google to find these pages, but it also allows users to jump quickly to other events similar to the current event page they are visiting.

breadcrumbs trail on Eventbrite event pages

breadcrumbs trail on Eventbrite event pages

Sitemap

A sitemap is a file, or multiple files, that provide a navigation for search engines to find all pages for a site. While it doesn’t replace linking, it does help crawlers find pages they might have missed due to orphaning and the absence of interlinking. Sitemaps also pass along useful metadata about each url, including when it was last modified and how often the page may change. While you will mainly see sitemaps as XML files, text and RSS file types are also accepted by Google.

For large sites, it is best to break up sitemaps as Google has a limit of 50,000 urls and a file size limit of 10MB uncompressed. You can then place the url to your smaller sitemaps into a sitemap index file. This is the approach we take at Eventbrite as we have over 10 million pages and growing.

Our main sitemap index holds links to the sitemaps for event pages, directory pages, venue profile pages and organizer pages, with information on when the sitemap was last modified. Each sitemap then has information on its priority. This gives Google an indication on how often it should come back to index new pages.

A snippet of Eventbrite's sitemap index

A snippet of Eventbrite’s sitemap index

Keep in mind that including a link in the sitemap will not guarantee that Google’s crawlers will index and parse that page. Sitemaps merely suggest links for search engines to index and should not replace linking practices.

Manual Submission

For new sites, it is unrealistic to expect Google crawlers to find their pages through outbound links. Google allows you to manually submit either a single page or sitemap through Google Webmaster Tools Seach Console. Again, it is Google’s discretion whether it will crawl and index these pages or not. You can also submit new pages through Google webmaster tools.

Google Crawl Budget

Google sets a crawl limit, also known as a budget, on each website. Every website has a different crawl budget closely linked to its page rank. This means the more Google deems your site as relevant and important the more time it will spend crawling and indexing your pages each time it visits your site.

Determining factors Google uses to set your crawl budget are your authority score, how often your site is updated, the frequency of new pages added and individual page speed and size. To increase the amount of pages Google indexes on each visit, make sure you reduce broken links as this is a waste of time and the crawler will have no further links to follow. You should also make sure there are no redirect loops. Redirect loops are where website A redirects to website B that then redirects back to website A. The crawler will be stuck in a loop when it could have been indexing other pages on your site.

Also utilize your robots.txt file and determine which pages are not important enough or have low quality, and add a rule to disallow crawlers from following and indexing these pages or directories. Eventbrite has over 10 million pages but only 1.5 million are included in Google’s index. We pay close attention to pages that are of low quality content, spam, dated etc. and restrict Google from indexing these pages. We also place links we deem as important as close to the homepage or easily accessible by our global navigation. A well thought out site hierarchy is key to making sure priority pages are indexed and reindexed frequently.

Wrap Up

With over 40 billion web pages on the internet, Google often needs a hand to find new websites and pages. An estimated number of pages indexed by Google is 10% of pages on the web. It is important to remember that when a user enters a search term in Google the pages searched are not the entire web but Google’s representation of the web. Results returned are those that Google has found and stored in their large databases.

You should not rely solely on one strategy to improve the chances of Google parsing and indexing all pages on your site. A clear and well thought out site hierarchy is important with all pages linked at least once internally. Sitemaps are a great starting point for Google to find your pages and manual submission is important for new pages that are of high priority.

As your site grows and it receives more inbound links, Google will prioritize indexing new pages as it wants the most relevant and popular pages appearing in search results. Including content that will draw users to your site will also increase your presence on search engines. Here at Eventbrite we live by the motto what is good for SEO should be good for user experience too.