Actions Panel
Spark Summit 2013
When and where
Date and time
Location
The Hotel Nikko 222 Mason Street San Francisco, CA 94102
Map and directions
How to get there
Refund Policy
Description
The team at Databricks is excited to announce the first Spark Summit, an event to bring the Apache Spark community together on Monday, December 2, 2013 (venue & local info). Come hear from leading production users of Spark, Shark, Spark Streaming and related projects; find out where the project development is going; and learn how to use the Spark stack in a variety of applications.
Apache Spark is a Hadoop-compatible computing system that makes big data analysis drastically faster, through in-memory computation, and simpler to write, through easy APIs in Java, Scala and Python. With a fast-growing community of 20+ companies contributing to the project, this event is a chance to learn from the leading organizations in big data.
Want to get hands-on with the Spark stack? When registering for the Summit, you can also sign up to attend a Spark Training on December 3, the day after the event.
Spark Summit – Mon Dec 2, 2013- 8-9am Breakfast
- 9-10:30am Keynotes #1
- The State of Spark, and Where We’re Going
Matei Zaharia (CTO, Databricks; Assistant Professor, MIT) - Turning Data into Value
Ion Stoica (CEO, Databricks; CTO, Conviva; Co-Director, UC Berkeley AMPLab) - Big Data Research in the AMPLab
Mike Franklin (Director, UC Berkeley AMPLab)
- The State of Spark, and Where We’re Going
- 10:30-10:45am Coffee Break sponsored by
- 10:45am-12:00pm Keynotes
- Hadoop and Spark Join Forces in Yahoo
Andy Feng (Distinguished Architect, Cloud Services, Yahoo) - Integration of Spark/Shark into the Yahoo! Data and Analytics Platform
Tim Tully (Distinguished Engineer/Architect, Yahoo)
- Hadoop and Spark Join Forces in Yahoo
- 12:00-1:00pm Lunch
-
Track A
- 1:00-2:45pm Applications, Part 1
- Mapping and manipulating the brain at scale (30min) Jeremy Freeman, HHMI Janelia Farm Research Campus
- Beyond Word Count – Productionalizing Spark Streaming (30 min) Ryan Weald, Sharethrough
- Sharing is Caring: Enabling Data Science Teams with Laburnum (15 min) Austin Gibbons, Quantifind
- A Full-Featured Enterprise Big Data Analytics Solution, Powered By Spark (30 min) Christopher Nguyen, Adatao
- 1:00-2:45pm Applications, Part 1
-
Track B
- 1:00-2:45pm Deployment
- Making Spark Fly: Building elastic and highly available Spark clusters with Amazon Elastic MapReduce (30 min) Parviz Deyhim, Amazon.com
- SIMR: Let your Spark Jobs Simmer Inside Hadoop Clusters (15 min) Ahir Reddy, Databricks
- Flint: Making Sparks (and Sharks!) (15 min)Jim Donahue, Adobe
- Spark Integration Into an Enterprise BigData Stack: Successes & Challenges(15 min) Konstantin Boudnik, WANdisco
- Packaging Spark for Fedora (15 min) William Benton, Red Hat, Inc
- Spark on Elastic Mesos, for Enterprise Use Cases (15 min) Paco Nathan, Mesosphere
- 1:00-2:45pm Deployment
- 2:45-3:00pm Break
-
Track A
- 3:00-4:30pm Applications, Part 2
- Real-Time Analytical Processing (RTAP) Using the Spark Stack (30 min) Jason Dai, Intel
- Using Spark and Shark for fast cycle analysis on diverse data (30 min) Vaibhav Nivargi, ClearStory Data
- Yahoo Audience Expansion: Migration from Hadoop Streaming to Spark (30 min)Gavin Li, Jaebong Kim, Andy Feng, Yahoo
- 3:00-4:30pm Applications, Part 2
-
Track B
- 3:00-4:30pm Sched & Perf
- Understanding the Performance of Spark Applications (30 min) Patrick Wendell, Databricks
- Next-Generation Spark Scheduling with Sparrow (30 min) Kay Ousterhout, UC Berkeley AMPLab
- Resource management and Spark as a first class data processing framework on Hadoop (30 min) Sandy Ryza, Cloudera
- 3:00-4:30pm Sched & Perf
- 4:30-4:45pm Break
-
Track A
- 4:45-6:00pm Applications, Part 3
- Towards Distributed Reinforcement Learning for Digital Marketing with Spark(15 min) Nedim Lipka, Adobe
- One platform for all: real-time, near-real-time, and offline video analytics on Spark(30 min) Davis Shepherd, Xi Liu, Conviva
- Collective Intelligence for Data Center Operations Management (15 min) Xiaojun Liu, CloudPhysics
- Unveiling the TupleJump Platform (15 min)Rohit Rai, Satyaprakash, TupleJump
- 4:45-6:00pm Applications, Part 3
-
Track B
- 4:45-6:00pm Spark Related Projets
- Catalyst: A Query Optimization Framework for Spark and Shark (15 min) Michael Armbrust, Databricks
- Deep Dive into BlinkDB: Querying Petabytes of Data in Seconds using Sampling (30 min) Sameer Agarwal, UC Berkeley AMPLab
- Spark Job Server (15 min) Evan Chan, Kelvin Chu, Ooyala
- StratioDeep: an integration layer between Spark and Cassandra (15 min) Oscar Méndez, Luca Rosellini, Alvaro Agea, Stratio
- 4:45-6:00pm Spark Related Projets
- 6:00-6:10pm Wrap-up talk
- 6:10-8pm Reception
- 8-9am Breakfast
- 9-10:30am Project tutorials #1
- Parallel programming with Spark
- Fast distributed query processing with Shark
- 10:30-10:45am Break
- 10:45-11:15am Project tutorials #2
- Real-time big data processing with Spark Streaming
- 11:15am-12:30pm Introductory hands-on exercises
- Spark and shark command line exercises
- Simple Spark standalone jobs
- 12:30-1:30pm Lunch
- 1:30-3:00pm Advanced hands-on exercises
- Writing a Spark Streaming job
- Machine Learning with Spark
- 3:00-4:00pm Administering Spark
- Installing Spark on Common Hadoop Distributions and EC2
- Monitoring and Troubleshooting Spark
Platinum Sponsors Gold Sponsors
Silver Sponsors
Instant Sponsors
Media Partners
Event Organized By
Apache Spark, Spark, the Spark logo, and Apache are trademarks of the Apache Software Foundation and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event. Apache Spark is an effort undergoing incubation at The Apache Software Foundation (ASF). For more details about incubation, see the "Apache Incubator Notice" on the Spark Homepage.