Metaflow Tutorial - useR! 2020
Event ended

Metaflow Tutorial - useR! 2020

N
By Netflix ML Infrastructure
Online event
Aug 7 , 2020 at 4:00 pm UTC
Overview

Metaflow is a Netflix open-source R package to build and manage production-grade data projects. Savin, Jason, and Bryan are ready for you!

When?

August 7th, 4 PM UTC - 7 PM UTC

What?

Models are only a small part of an end-to-end data science project. Production-grade projects rely on a thick stack of infrastructure. At a minimum, projects need data and a way to perform computation on it. Data is accessed from a data warehouse, which can be a folder of files or a multi-petabyte data lake. The modeling code that crunches the data is executed in a compute environment which can range from a laptop to a large-scale container management system.

How do you architect the code to be executed? How do you version the code, input data, and models produced? After the model has been deployed to production, how do you monitor its performance? How do you deploy new versions of your code to run in parallel with the previous version? The software industry has spent over a decade perfecting DevOps best practices for normal software. We are just getting started with data science.

Metaflow is a human-friendly R package that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. With Metaflow, you can write your models and business logic as idiomatic R code with any of your favorite machine learning or data science libraries and not worry about any infrastructural concerns.

In this tutorial, participants will be able to write, iterate upon, and deploy their own production-ready models using packages they all know and love, such as caret and mlr.

Who?

The workshop assumes that you are familiar with basic R and the RStudio IDE. This includes topics such as installing packages, assigning variables, and writing functions.

You will need an internet-connected computer with either a Linux OS, macOS, or Windows 10 (with WSL 2) since these are the operating systems that Metaflow supports currently. You will also need a recent version of R, RStudio, and Python 3, including the ability to download packages from the internet.

Please follow the instructions here to install Metaflow. In case of any issues/questions, please reach out to us on gitter.

Instructors

Savin Goyal

Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. Savin focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and has spoken before at useR! 2019 on metaflow-r.

Jason Ge

Jason Ge is a software engineer at Netflix Machine Learning Infrastructure team building a human-centric machine learning platform Metaflow. Jason was the author for an R package Picasso which has been downloaded 29K+ times. At Netflix, Jason has been helping data scientists improve their productivity across different use cases leveraging Metaflow: a human-centric and developer-friendly infrastructure toolkit.

Bryan Galvin

Bryan leads Data Science at LA Times. Bryan has spoken before at useR! 2018 on metaflow-r and has substantive experience leading tutorial sessions internally and externally.

Metaflow is a Netflix open-source R package to build and manage production-grade data projects. Savin, Jason, and Bryan are ready for you!

When?

August 7th, 4 PM UTC - 7 PM UTC

What?

Models are only a small part of an end-to-end data science project. Production-grade projects rely on a thick stack of infrastructure. At a minimum, projects need data and a way to perform computation on it. Data is accessed from a data warehouse, which can be a folder of files or a multi-petabyte data lake. The modeling code that crunches the data is executed in a compute environment which can range from a laptop to a large-scale container management system.

How do you architect the code to be executed? How do you version the code, input data, and models produced? After the model has been deployed to production, how do you monitor its performance? How do you deploy new versions of your code to run in parallel with the previous version? The software industry has spent over a decade perfecting DevOps best practices for normal software. We are just getting started with data science.

Metaflow is a human-friendly R package that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. With Metaflow, you can write your models and business logic as idiomatic R code with any of your favorite machine learning or data science libraries and not worry about any infrastructural concerns.

In this tutorial, participants will be able to write, iterate upon, and deploy their own production-ready models using packages they all know and love, such as caret and mlr.

Who?

The workshop assumes that you are familiar with basic R and the RStudio IDE. This includes topics such as installing packages, assigning variables, and writing functions.

You will need an internet-connected computer with either a Linux OS, macOS, or Windows 10 (with WSL 2) since these are the operating systems that Metaflow supports currently. You will also need a recent version of R, RStudio, and Python 3, including the ability to download packages from the internet.

Please follow the instructions here to install Metaflow. In case of any issues/questions, please reach out to us on gitter.

Instructors

Savin Goyal

Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. Savin focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and has spoken before at useR! 2019 on metaflow-r.

Jason Ge

Jason Ge is a software engineer at Netflix Machine Learning Infrastructure team building a human-centric machine learning platform Metaflow. Jason was the author for an R package Picasso which has been downloaded 29K+ times. At Netflix, Jason has been helping data scientists improve their productivity across different use cases leveraging Metaflow: a human-centric and developer-friendly infrastructure toolkit.

Bryan Galvin

Bryan leads Data Science at LA Times. Bryan has spoken before at useR! 2018 on metaflow-r and has substantive experience leading tutorial sessions internally and externally.

Organized by
N
Netflix ML Infrastructure
Followers--
Events1
Hosting4 years
Report this event
Sales ended
Aug 7 · 4:00 pm UTC