Actions and Detail Panel
Apache Spark Workshop
Sat, April 22, 2017, 8:00 AM – 1:00 PM EDT
Half day Apache Spark from the Scratch.
Dell EMC will be providing breakfast!
- 8:00am - 9:00am: Breakfast and networking
- 9:00am: Workshop starts
-12:30pm: We should be done by this time.
We will look at:
0. Getting Spark ready on your computer.
1. Apache Spark Computation Model RDDs (Resilient Distributed Datasets).
- Lazy nature of RDDs.
- RDD transformations vs actions.
- How to create RDD from different sources (let's play with some datasets).
- RDD API. How to bend it to our needs.
2. Spark SQL
- Interoperability with RDDs.
- SQL on top of RDDs.
- Accessing to Tabular Data Sources.
- Data Frames optimization on top of RDDs.
- Distributed SQL Engine case of study.
3. Spark Streaming.
- Streaming API.
- Streaming Sources.
- Extending the API.
- Twitter case of study
- Interoperability with SQL and RDDs
Your are expected to follow along with code examples, so we will be helping you to install Spark locally as the first part of the sessions.