Free

Pre-Spark Summit Meetup in Dublin, Ireland

Event Information

Share this event

Date and Time

Location

Location

The Convention Center Dublin

Spencer Dock, N Wall Quay

North Dock

Dublin 1

Ireland

View Map

Event description

Description

Come join us on the eve of Spark Summit, EU in Dublin for an evening of technical talks from Apache Spark contributors, committers, and users. This meetup is organized and sponsored by Databricks, the company behind Apache Spark.

The Meetup will be in Room Liffey A, The Convention Center Dublin. It's free and open to anyone. You don't have to be registered for the Spark Summit to attend this meetup.

AGENDA:

6:00 PM - 6:30 PM Refreshments + Networking

6:30 PM - 6:40 PM Opening and Welcome Remarks

6:40 PM - 7:20 PM Tensorflow on Apache Spark with TensorFrames and DataFrames (Tim Hunter)

7:20 PM - 8:00 PM An introduction to Spark ML Pipelines (Holden Karau)

8:00 PM - 8:40 PM Apache Spark’s Structured Streaming in the Cloud: What to Consider and Why (Bill Chambers)

8:45 PM - 9:00 PM More Networking + Mingling

Abstract Talk-1: Tensorflow on Apache Spark with TensorFrames and DataFrames

Since the creation of Apache Spark, I/O throughput has increased at a faster pace than processing speed. In a lot of big data applications, the bottleneck is increasingly the CPU. With the release of Apache Spark 2.0 and Project Tungsten, Spark runs a number of control operations close to the metal. At the same time, there has been a surge of interest in using GPUs (the Graphics Processing Units of video cards) for general purpose applications, and a number of frameworks have been proposed to do numerical computations on GPUs.

In this talk, we will discuss how to combine Apache Spark with TensorFlow, a new framework from Google that provides building blocks for Machine Learning computations on GPUs. Through a binding between Spark and TensorFlow called TensorFrames, distributed numerical transforms on Spark DataFrames and Datasets can be expressed in a high-level language and still rely on highly optimized implementations.

The developers of the TensorFrames package will provide an overview, a live demo on Databricks and a presentation of the future plans. For experts, this talk will also include some technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations for numerical applications.

Bio:

Tim Hunter is a software engineer at Databricks and a frequent Spark contributor to the Apache Spark MLlib project. He has been building distributed Machine Learning systems with Spark since version 0.2, before Spark was an Apache Software Foundation project. He's co-author of Deep Learning Pipelines and GraphFrames and author of TensorFrames

Abstract Talk-2: An introduction to Spark ML Pipelines

Are you tired of learning six different libraries to try and stitch together a machine learning solution? Spark ML's pipeline approach is designed to unify and simplify your machine learning tasks (from training to prediction) while presenting a (close enough to) uniform API. This talk will introduce Spark's ML pipelines and has a companion notebook for building a simple ML pipeline. Once you've learned how to build ML pipelines make sure to come to Spark Summit and see Nick and Holden talk about how to add your own custom algorithm's into Spark's ML pipelines.

Bio:

Holden Karau is transgender Canadian, Apache Spark committer, and co-author of Learning Spark & High-Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM, she worked on a variety of distributed and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science.

Abstract Talk-3: Apache Spark’s Structured Streaming in the Cloud: What to Consider and Why

Running streaming workloads successfully is a challenge regardless of deploying on-premises or in the cloud. While buying a managed service is an option, it's usually quite expensive. Therefore, many companies opt for open-source streaming engines like Apache Spark's Structured Streaming.

This talk will expound on considerations and merits when evaluating engines for streaming workloads in the cloud. In particular, we will focus on two points:

  • The motivation for Structured Streaming and myriad streaming workloads

  • Lessons learned while working with hundreds of streaming workloads in the cloud

Apache Spark’s Structured Streaming consolidates all big data processing under a unified API. Built on the foundation of Spark SQL engine, not only does Structured Streaming allow developers to express same queries for the batch as for streaming, but it also adopts different execution strategies for streaming processing, including micro-batching for high throughput or continuous processing for low latency.

In this session, we will expand on the internals of the Structured Streaming engine and share why it’s suitable for a variety of use cases. Working with hundreds of streaming workloads spanning diverse requirements, we have amassed practitioners’ knowledge and lessons learned. These include:

- How to successfully create business value with streaming

- What makes a successful streaming use case and what doesn’t

- A decision framework for choosing a streaming engine

- Unique advantages of the cloud (both in storage and compute)

- How to effectively leverage persistent cloud storage like S3 and Azure Blob Store - How to successfully monitor and maintain your streaming applications

- Structured Streaming’s future development

Bio:

Bill Chambers is a Product Manager at Databricks and a contributor to Apache Spark. At Databricks he leads internal data science initiatives and Spark related product development, especially Structured Streaming.

He co-authored with Matei Zahari and led the efforts for the book Spark: The Definitive Guide for O'Reilly Press and has been using Apache Spark from early days at UC, Berkeley, where he received an M.Sc. in Information Management.

Share with friends

Date and Time

Location

The Convention Center Dublin

Spencer Dock, N Wall Quay

North Dock

Dublin 1

Ireland

View Map

Save This Event

Event Saved