This event will fill up. We encourage you to RSVP.
7:00- 8:15 Talk / Q&A
For the last few months, people have been asking to hear more about Mesos - the new project which provides dynamic resource sharing for clusters. Paco Nathan, who will be in Austin to offer his Data Science Workshop, has been working closely with the Mesos team for quite some time, so we asked Paco if he could spare an evening to tell us what's so exciting about Mesos.
Paco informed Tobias Knaup, co-founder of Mesosphere, that he was coming to Austin, and invited him to come as well. Tobias, former Tech Lead for Airbnb, is an expert in machine learning, natural language processing, and sentiment analysis -- and he is a rails programmer.
This is a great opportunity to learn about Mesos from two of its leading experts.
What's so exciting about Mesos?
Just a few weeks ago, there was a Wired article about Borg and Omega, as well as some of the other projects that provide some of Google's secret sauce for their data centers.
These projects have been going on at Google for a long time. One of the pointers to the value of these projects has been when Googlers leave to work elsewhere, one of the first things they lament is that they don't have the same infrastructure as they did at Google.
Enter Mesos -- Ben Hindman did it for his dissertation at Berkeley, at AMPLab. It's open-source and just recently was accepted as an Apache project. Word is that it functions as a replacement for Borg -- maybe even a little better or different, more kind of towards what is allegedly Omega.
Imagine -- you've got petabytes of data you're running over in one cluster. You've got another cluster running Memcache, Heroku, orsome web apps or whatever. You're serving stuff, so when you crunch everything, you've got to go over the wire to get your data to the other cluster. Would make more sense if you ran a slightly bigger cluster that ran both, so you don't have to pay the tax sending everything back and forth over the wire? Come learn about Mesos.
Interested in Spark?
In order to use Spark, you need to install Mesos.
Spark allows you to run MapReduce jobs together with your data on distributed machines. Unlike Hadoop Spark can distribute your data in slices and store it in memory hence your processing and data are co-located in memory. This gives an enormous performance boost. Spark is more than MapReduce, however. It offers a new distributed framework on which different distributed computing paradigms can be modelled. Examples are: Hadoop’s Hive => Shark (40x faster than Hive), Google’s Pregel / Apache’s Giraph => Bagel, etc. An upcoming Spark Streaming is supposed to bring real-time streaming to the framework.