End-to-End Streaming ML Recommendation Pipeline Spark 2.0, Kafka, TensorFlow
End-to-End Streaming ML Recommendation Pipeline Spark 2.0, Kafka, TensorFlow

End-to-End Streaming ML Recommendation Pipeline Spark 2.0, Kafka, TensorFlo...

Event Information

Share this event
Date and Time

São Paulo

Sao Paulo


View Map

Friends Who Are Going
Event description


Architecture Diagram

The goal of this workshop is to build an end-to-end, streaming recommendations pipeline using the latest streaming analytics tools inside a portable, take-home Docker Container in the cloud.

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.

Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.

Last, we productionize our pipeline and serve live recommendations to our users!

You’ll learn how to...

  • Create a complete, end-to-end streaming data analytics pipeline

  • Interactively analyze, approximate, and visualize streaming data

  • Generate machine learning, graph & NLP recommendation models

  • Productionize our ML models to serve real-time recommendations

  • Perform a hybrid on-premise and cloud deployment using Docker

  • Customize this workshop environment to your specific use cases


Part 1 (Analytics and Visualizations)

Analytics and Visualizations Overview (Live Demo!)

Verify Environment Setup (Docker)

Notebooks (Zeppelin, Jupyter/iPython)

Interactive Data Analytics (Spark SQL, Hive, Presto)

Graph Analytics (Spark Graph, NetworkX, TitanDB)

Time-series Analytics (Cassandra)

Visualizations (Kibana, Matplotlib, D3)

Approximate Queries (Spark SQL, Redis, Algebird)

Workflow Management (AirFlow)

Part 2 (Streaming and Recommendations)

Streaming and Recommendations Overview (Live Demo!)

Streaming (NiFi, Kafka, Spark Streaming, Flink)

Cluster-based Recommendation (Spark ML, Scikit-Learn)

Graph-based Recommendation (Spark ML, Spark Graph)

Collaborative-based Recommendation (Spark ML)

NLP-based Recommendation (CoreNLP, NLTK)

Geo-based Recommendation (ElasticSearch)

Hybrid On-Premise+Cloud Auto-Scale Deploy (Docker)

Customize the Workshop Environment for Your Use Cases

Target Audience

  • Interest in learning more about the streaming data pipelines that power their real-time machine learning models and visualizations

  • Interest in building more intuition about machine learning, graph processing, natural language processing, statistical approximation techniques, and visualizations

  • Interest in learning the practical applications of a modern, streaming data analytics and recommendations pipeline

  • Anyone who wants to try 3D-printed PANCAKES!!


  • Basic familiarity with Unix/Linux commands

  • Experience in SQL, Java, Scala, Python, or R

  • Basic familiarity with linear algebra concepts (dot product)

  • Laptop with an ssh client and modern browser

  • Every attendee will get their own fully-configured cloud instance running the entire environment

  • At the end of the workshop, you will be able to save and download your environment to your local laptop in the form of a Docker image

Share with friends
Date and Time

São Paulo

Sao Paulo


View Map

Save This Event

Event Saved