Skip Main Navigation
Page Content
This event has ended

Databricks Hands-on Apache Spark Workshop with Paco Nathan (Chicago)

Global Data Geeks

Monday, August 25, 2014 from 8:30 AM to 4:30 PM (CDT)

Chicago, IL

Ticket Information

No tickets available.

Share Databricks Hands-on Apache Spark Workshop with Paco Nathan (Chicago)

Event Details

This is a full-day course on Apache Spark, led by Paco Nathan, Spark evangelist for Databricks and early Spark adopter. In this workshop, Paco will be sharing best practices and use cases from teams who have deployed Spark at enterprise scale.

Format:

  • full-day course (8 hours)

  • requires: wifi, laptops, limited use of AWS (access instructor’s cluster)

Audience:

  1. developers who are not familiar with Spark, learning to create apps

  2. Java developers who are familiar with Hadoop, just learning Scala

  3. Python developers (PySpark), but this also teaches some basics of Scala

Goals:

  • show how to install/deploy on major Hadoop distros

  • enable participants to deploy a Spark app back at the office (or in the cloud)

Course Outline


Getting Started:

  • downloads

  • show SBT primer

  • initial intro apps: showing Python and Scala


Progression through the full software development lifecycle, step by step:

  • how to “Think Spark”

  • how to build a JAR

  • run JAR on laptop

    • ideally no VMs (except Windows?)

  • deploy JAR to Hadoop cluster, via these alternatives:

    • install on CM

    • just run JAR on YARN

    • use Simmer? (run shell within MR job)

  • review UI features

    • is my job still running?

    • how to diagnose / troubleshoot


Unifying the pieces into a single app: SQL, Spark, Streaming, Shark

  • showing how the same business logic can be deployed across multiple topologies

  • demo a Spark Streaming app


Advanced topics (lecture/demo only)

  • GraphX

  • MLlib

  • BlinkDB

  • Tachyon

Success Criteria


By end of day, participants will be comfortable performing the following:

  • open a Spark Shell

  • explore data sets loaded from HDFS

  • use some ML algorithms

  • use SparkSQL

Instructor Bio:

Paco Nathan @pacoid , is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years, and worked as an OSS evangelist for the past 2+ years. Expert in distributed systems, machine learning, cloud computing, functional programming -- with a focus on Enterprise data workflows. Paco is an O'Reilly author, and an advisor for several firms including The Data Guild and Zettacap. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.

Paco is author of the O'Reilly book:
Enterprise Data Workflows with Cascading
.

Paco's Wikipedia Page 
Paco on TwitterLinkedin, Slideshare, Github

Have questions about Databricks Hands-on Apache Spark Workshop with Paco Nathan (Chicago)? Contact Global Data Geeks

When & Where


MicroTek
230 W Monroe St
#550
Chicago, IL 60606

Monday, August 25, 2014 from 8:30 AM to 4:30 PM (CDT)


  Add to my calendar

Please log in or sign up

In order to purchase these tickets in installments, you'll need an Eventbrite account. Log in or sign up for a free account to continue.