Actions and Detail Panel
End-to-End Streaming ML Recommendation Pipeline Spark 2.0, Kafka, and Tenso...
Sat, June 4, 2016, 9:00 AM – 5:00 PM PDT
The goal of this workshop is to build an end-to-end, streaming recommendations pipeline using the latest streaming analytics tools inside a portable, take-home Docker Container in the cloud.
First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
Last, we productionize our pipeline and serve live recommendations to our users!
You’ll learn how to...
Create a complete, end-to-end streaming data analytics pipeline
Interactively analyze, approximate, and visualize streaming data
Generate machine learning, graph & NLP recommendation models
Productionize our ML models to serve real-time recommendations
Perform a hybrid on-premise and cloud deployment using Docker
Customize this workshop environment to your specific use cases
Part 1 (Analytics and Visualizations)
Verify Environment Setup (Docker)
Analytics and Visualizations Overview
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark GraphX, NetworkX)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Spark Job/Workflow Management (AirFlow)
Demo(s): Many Notebook-based Demos including Python and Scala
Part 2 (Spark Streaming ML Recommendations)
Streaming Engines (Kafka Streams, Spark Streaming, Flink Streaming)
Cluster-based Recommendations (Spark ML, Scikit-Learn)
Graph-based Recommendations (Spark ML, Spark Graph)
Collaborative-based Recommendations (Spark ML, Spark Graph)
NLP-based Recommendation (Spark ML, Stanford CoreNLP, NLTK)
Demo: Build and Deploy an End-to-End Spark ML Recommendation Engine
Demo: Incremental Spark ML Model Training from Streaming Batches
** Demo: Build a LARGE Scale Distributed Spark Cluster (~1000 Cores, 5 TB RAM)!!
Part 3 (Spark + TensorFlow)
Machine Learning vs. Deep Neural Networks
Deep Neural Network Fundamentals
Tensors, NDArrays, Matrix Multiplication
Stochastic Gradient Descent
Activation Functions (Sigmoid and Hyperbolic Tangent)
TensorFlow Core and Debugging
Demo: Build, Execute, and Debug a TensorFlow Computational Graph with TensorBoard
Demo: Build and Execute a Custom TensorFlow Operation into your Computation Graph
** Demo: Build a LARGE Scale Distributed TensorFlow Cluster (~1000 Cores, 5 TB RAM)!!
Part 4 (Save and Download Your Work)
Save and Download your Docker Container as a tar file
Interest in learning more about the streaming data pipelines that power their real-time machine learning models and visualizations
Interest in building more intuition about machine learning, graph processing, natural language processing, statistical approximation techniques, and visualizations
Interest in learning the practical applications of a modern, streaming data analytics and recommendations pipeline
Interest in learning about Distributed Deep Neural Network Frameworks including TensorFlow
Basic familiarity with Unix/Linux commands
Experience in SQL, Java, Scala, Python, or R
Basic familiarity with linear algebra concepts (dot product)
Laptop with an ssh client and modern browser
Every attendee will get their own fully-configured cloud instance running the entire environment
At the end of the workshop, you can save and download your Docker Container to your local laptop in the form of a Docker Image tar file.