Intro to Spark & Data Science Hands-on Lab
Event Information
Description
IBM is hosting a free, hands-on lab centered around Spark and Data Science Experience (DSX).
Data Science Experience is a platform that brings together everything that a data scientist needs. It includes the most popular open source libraries and some unique value-add functionalities. DSX has community and social features to help learn, create and collaborate. The platform is built on Spark and makes Spark available for experimentation and model building.
Who should attend? This lab is for Data Scientists (or people who want to transition into data sciences), Data Analysts, Data Engineers or anyone who is interested in broadening their skills.
Prerequisites:
- Sign up for a free Bluemix (www.bluemix.net) and DSX (http://datascience.ibm.com)
- You must bring your own laptop
Continental Breakfast & Lunch will be served. Parking will be validated.
Agenda
Spark/DSX introduction
- Connecting to data sources
- Spark Overview
- Data Structures
- Accumulators, Broadcasters
Lab 1: Data ingest, exploration and visualization
Lunch
Lab 2: Spark SQL ⁃ Filtering, selecting
- Type casting
- Clauses (if, when)
Lab 3: Spark MLlib ⁃ Common feature engineering
- Examples with classification models