Dynamic Talks Silicon Valley: "In-Stream data processing, Data Orchestratio...

Event Information

Share this event

Date and Time

Location

Location

Santa Clara Convention Center - Room 212

5001 Great America Pkwy

Santa Clara, CA 95054

View Map

Event description

Description

Come join us at the third event of our free technical meetup series, "Dynamic Talks", in Silicon Valley!

This is an ongoing meetup series featuring technical talks from some of the leading experts in tech in major cities around the US. Enjoy talks about the most innovative subjects in: Big data, AI, ML, voice platforms, the Cloud, search and more. Every event is free, with complimentary food and drinks.

This event will be co-organized with the SF Big Analytics meetup group

Topic: In-Stream data processing, Data Orchestration & More

Agenda

[5:45pm - 6:10pm]: Guests arrive, Welcome reception

[6:10pm - 6:15pm]: Introduction

[6:15pm - 6:50pm]: the first talk will be presented by Haoyuan Li on: "Data Orchestration for Analytics and AI in the Cloud”, followed by a Q&A

[6:50pm - 7:25pm]: the second talk will be presented by Jonas Lagerblad on: "Analytics evolution from columnar data stores to transactional data lakes" followed by a Q&A

[7:25pm - 8:05pm]: the third talk will be presented by Max Martynov and Ilya Katsov on: "Building an In-stream Data Summarization Pipeline Using Spark" followed by a Q&A

[8:05pm-: 8:45pm]: Networking and the event conclude


T
alk details:

Alluxio - "Data Orchestration for Analytics and AI in the Cloud"

Cloud has been dramatically changing the landscape of data engineering as well as the behavior of data engineers. Specifically, data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, more scalable but often fully disaggregated and remote data lake model (e.g. AWS S3). This has also created a strong need for data orchestration in the cloud  like what Kubernetes does for container-based workloads, so that data can be presented in the right layout at right location for data consuming applications on the cloud.Originally developed from UC Berkeley AMPLab as research project "Tachyon", Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud. Alluxio creates a unified access layer for data-driven applications in bigdata and ML, enabling Spark, Presto or TensorFlow and etc to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access.

In this talk, we will present:

- New trends and challenges in the data ecosystem in cloud era

- Effective Data engineering in the cloud world with data orchestration

- Production use cases of using popular stacks like Presto or Tensorflow with Alluxio on S3

About the Speakers:

Haoyuan (H.Y.) Li is the Founder, and CTO of Alluxio. He graduated with a Computer Science Ph.D. from the AMPLab at UC Berkeley, advised by Prof. Scott Shenker and Prof. Ion Stoica. At the AMPLab, he co-created and led Alluxio (formerly Tachyon), an open source virtual distributed file system. Before UC Berkeley, he got a M.S. from Cornell University and a B.S. from Peking University, all in Computer Science.

Bin Fan is the founding engineer and VP of Open Source at Alluxio, Inc. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems.


Alteryx - "Analytics evolution from columnar data stores to transactional data lakes"

Self-service analytics has been evolving rapidly over the last two decades, this talk will highlight some of the important innovations that have happened, what they have enabled and what the next big challenges are going forward. Technologies and research areas that will be covered are: in-memory data stores, ETL, data visualization, the Grammar of Graphics, data catalogs, massively scalable dashboards, algorithm and ML-driven data modeling and query building, Delta Lake (DataBricks), Apache Spark, Apache Arrow, R, Python, HyPer, Weld, GraalVM and SQL2 (SlamData). There might also be some anecdotes about companies in the BI/Analytics field that the speaker has interacted with or worked for.

About Jonas Lagerblad:

Jonas Lagerblad is a Sr. Architect at Alteryx, with a 20+ year career in the data analytics space, developing BI products that have made it to the leader quadrant of the Gartner BI Magic quadrant for three different companies; (TIBCO) Spotfire, Oracle and ClearStory Data.


Grid Dynamics - "Building an In-stream Data Summarization Pipeline Using Spark"

In-stream data summarization is important for many applications that deal with extreme data volumes or require low latency analytics. Although in-stream processing frameworks have been rapidly evolving over the last years, building fault-tolerant high-performance in-stream pipelines still represents a challenge for certain types of data summarization. In this talk, we'll present lessons learned from building a fault-tolerant data summarization pipeline for processing 10Bil+ events per day. We'll discuss the core Spark/Cassandra-based architecture, failure recovery design, deployment approach, and monitoring components. We'll also discuss techniques for challenging cases of data summarization such as counting distinct elements and finding the most frequent elements in a data steam.

About Max Martynov:

Max Martynov joined Grid Dynamics in 2008 with a mission to establish and lead their High Performance Computing practice. He is currently responsible for defining and executing technology strategy, driving innovation through R&D, and performing architecture and technology consulting for clients. During his tenure at Grid Dynamics, Max has been leading engagements with a number of major technology and retail companies, including Microsoft, Macy's, Kohl's, and JCPenney. Over the last decade, his focus evolved from HPC and scalable distributed platforms to Digital Transformation, Cloud, BigData, DevOps, Microservices architecture, and AI. He is the co-author of the book, "Continuous Delivery Blueprint" (2018).

About Ilya Katsov:

Ilya joined Grid Dynamics in 2009, and since then has been leading engagements with a number of major retail and technology companies, focusing primarily on Big Data, Machine Learning, and Economic Modeling. He is currently managing the Industrial AI consulting practice that helps clients become successful AI adopters and deliver innovative AI solutions. He is the author of several scientific articles and international patents, and also authored a book, "Introduction to Algorithmic Marketing: Artificial Intelligence for Marketing Operations" (2017).

Date and Time

Location

Santa Clara Convention Center - Room 212

5001 Great America Pkwy

Santa Clara, CA 95054

View Map

Save This Event

Event Saved