The goal of this workshop is to build an end-to-end, streaming recommendations pipeline using the latest streaming analytics tools inside a portable, take-home Docker Container in the cloud.
First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
Last, we productionize our pipeline and serve live recommendations to our users!
You’ll learn how to...
Create a complete, end-to-end streaming data analytics pipeline
Interactively analyze, approximate, and visualize streaming data
Generate machine learning, graph & NLP recommendation models
Productionize our ML models to serve real-time recommendations
Perform a hybrid on-premise and cloud deployment using Docker
Customize this workshop environment to your specific use cases
Part 1 (Analytics and Visualizations)
Analytics and Visualizations Overview (Live Demo!)
Verify Environment Setup (Docker)
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark Graph, NetworkX, TitanDB)
Time-series Analytics (Cassandra)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Workflow Management (AirFlow)
Part 2 (Streaming and Recommendations)
Streaming and Recommendations Overview (Live Demo!)
Streaming (NiFi, Kafka, Spark Streaming, Flink)
Cluster-based Recommendation (Spark ML, Scikit-Learn)
Graph-based Recommendation (Spark ML, Spark Graph)
Collaborative-based Recommendation (Spark ML)
NLP-based Recommendation (CoreNLP, NLTK)
Geo-based Recommendation (ElasticSearch)
Hybrid On-Premise+Cloud Auto-Scale Deploy (Docker)
Customize the Workshop Environment for Your Use Cases
Interest in learning more about the streaming data pipelines that power their real-time machine learning models and visualizations
Interest in building more intuition about machine learning, graph processing, natural language processing, statistical approximation techniques, and visualizations
Interest in learning the practical applications of a modern, streaming data analytics and recommendations pipeline
Anyone who wants to try 3D-printed PANCAKES!!
Basic familiarity with Unix/Linux commands
Experience in SQL, Java, Scala, Python, or R
Basic familiarity with linear algebra concepts (dot product)
Laptop with an ssh client and modern browser
Every attendee will get their own fully-configured cloud instance running the entire environment
At the end of the workshop, you will be able to save and download your environment to your local laptop in the form of a Docker image