Apache Spark™ Community Event
IBM | SPARK
The start of something big in data and design.
As we race into the age of the digital universe, every interaction, every sentiment, every environmental factor is a known quantity.
Apache Spark™ has made it possible for data scientists to build models quickly and iterate faster so more people can apply deep intelligence into every application, including IoT, machine learning, web, mobile, social, business process, and more.
Join fellow data scientists on June 15th at Galvanize for a Spark community event and hear how IBM and Spark are changing data science and propelling the Insight Economy forward.
6:00pm - 7:00pm
Reception, Demos, and Innovation Wall
Share a poster on your Spark research or application on our Innovation Wall! Please indicate your topic on the registration form to confirm your participation.
7:00pm - 8:00pm
Town Hall Panel of Data Scientists
See below for our great lineup of panelists!
8:00pm - 8:45pm
Lightning Talks: Community Spark Solutions
See below for our speakers!
Can't attend in person?
We'll be live streaming this event to a computer screen near you! Register for the live stream and we'll send you a calendar update and the login link.
Visit IBM's Spark resources page to find an online Spark course and relevant meetups, hackathons, and conferences in your neighborhood.
We're excited to hear from some of our favorite data scientists during our Town Hall Panel.
Join us for 5-minute lightning talks to hear about the exciting Spark projects that community members are working on. Check back for talk topics coming soon!
- Alec McGail, Systems Engineer at Perscio
Perscio data scientists recently won the Big Data for Social Good Global Challenge. Their application based on IBM Analytics for Hadoop "Watch Flu Spread" wowed fellow data scientists and judges from around the globe. Alec McGail, a Spark enthusiast and data scientist will discuss the innovation happening today with Spark and his recent project which involved injesting 170GB into Spark and running analytics in one hour for $5.
- Gerard Maas, Lead Data Processing at Virdata
Apache Spark: One framework to rule them all. Or how we unify batch, streaming, and ad-hoc analytics at Virdata. Apache Spark is a cluster computing framework for big data processing that provides a unified model applicable to batch, streaming, machine learning and graph analytics. In this lightning talk we will present how Virdata has adopted Apache Spark in their IoT platform, where they use Spark and Spark Streaming to form a so-called Lambda architecture that processes the massive IoT data stream coming into their system. They can then connect this data with their Spark-as-a-Service offering, which enables data scientists to do ad-hoc exploration and analysis, build models, and create value from the data.
- Dr. Timothy Howes, Chief Technology Officer at ClearStory Data
ClearStory Data has been building the next-generation Analytics solution on Spark since the time Spark was still in the AMPLab. With deep roots in Spark and Spark-based data analysis, Dr. Timothy Howes will address the merits of Spark, its proven advantages, and its trajectory that stands to dramatically advance analytics on big, diverse data. Specifically, Dr. Howes will cover how ClearStory's implementation of Spark is used to:
1. Speed data inference and data heuristics over large, diverse data, reducing data prep times
2. Automate data blending via 'intelligent data harmonization', so more diverse data can be easily blended
3. Enable interactive analysis and exploration, for both real-time data sources and batch updates
4. Manage data lineage from data sources, through inference and data harmonization, to interactive insights.
- Robert Parkin, Principal Scientist, IBM Commerce at IBM
Rob will be talking about how IBM Commerce is building out a new analytics platform for marketing and merchandising based on Spark technology. He will talk briefly about how his team is using Spark to produce dramatic performance improvements in Commerce’s current merchandising applications and how they are using it to create new applications that can integrate data and analytics across different marketing channels. He will describe an exciting new product called Customer Journey Analytics that uses data across a variety of different sources to enable marketers to deliver more relevant, tailored personal experiences to their customers.
- Luis Arellano, Program Director, IBM Cloud Data Services at IBM
Luis will share the latest plans with Spark from IBM's newly-formed Cloud Data Services division, touching on highlights and how customers can get involved.
- Peter Schlampp, Vice President of Products at Platfora
Platfora, the leading end-to-end Big Data Discovery platform, is leveraging Spark to disrupt the current analytics paradigms. For data scientists, Spark is easier to use than MapReduce, allowing programming in more languages and offering pre-packaged libraries for complex analytics and graph processing. However, Spark is only a processing framework and not a platform for integration and collaboration between the players in the pipeline. Furthermore, business analysts who rely on data scientists can get caught waiting for more data as they dive deeper into their analysis. In this talk, Platfora will demonstrate how the traditional data analysis pipeline can be accelerated through their platform and how the integration of Spark will put advanced analytics in the hands of both data scientists and analysts.
- Pablo Tapia, CEO at Tupl Inc.
The wireless telecom sector is experiencing a very fast paced technological development, with 4G technologies being deployed, together with higher network densification and the Internet of Things becoming a reality. In the face of this change, operators backend systems have remained obsolete and are not able to cope with more sophisticated actions required to operate the networks in an efficient manner. In this talk we will be presenting TuplOS, a revolutionary architecture that leverages multiple components of Spark to deliver a powerful and scalable backend system that significantly improves operational efficiency for telecom carriers.
- Darwin Leung, Director of Infomatics at Independence Blue Cross
Darwin will discuss challenges in Healthcare and how we can use big data solutions to help improve revenue, reduce costs while helping to improve patient medical outcomes.
- Olly Stephens, Engineering Systems Architect at ARM
Electronic Design Automation (EDA) is a traditional high performance compute industry; designing and verifying a hardware component requires huge numbers of compute cycles and produces a large amount of diverse data. It’s probably not a ‘big data’ problem by comparison with some industries, but it’s certainly a ‘wide data’ problem, and a particular challenge to process. We have been championing the use of data processing pipelines to improve the efficiency of our work internally, and are successfully using Spark in both its streaming and batch processing form to build intelligent governance systems for internal infrastructure, and complex analytic reporting dashboards for internal projects.
Network with Spark community leaders including Alpine Data Labs, ClearStory Data, Databricks, Looker, Platfora, Zoomdata, and others.
Parking will be provided by IBM. In order to qualify for validated parking, you must indicate your interest on the registration form. Parking lot is located at Towne Park Public Parking at 350 2nd Street, San Francisco. You will be given a validated ticket when you check in at the event. Space is limited.
Apache®, Apache Spark(TM) Spark (TM), are trademarks of the Apache Software Foundation in the United States and/or other countries.
Have questions about Apache Spark™ Community Event?
Contact IBM Analytics