Apache Hadoop (http://hadoop.apache.org)
has rapidly become the platform of choice for data-intensive
supercomputing around the world. Yahoo! is one of the main contributors
and the largest user of Apache Hadoop in the world.
In this tutorial, we will learn how Yahoo! uses Hadoop for solving real-world big data problems. We will see how easy it is to develop MapReduce applications in Hadoop by using high-level dataflow language such as Pig Latin. We will get an in-depth look at Hadoop core components, such as the Distributed File System (HDFS), and Map-Reduce programming framework.
This talk will be conducted by Dr Milind Bhandarkar
The evening will consist of two 90 minutes talks, with a break in between. Some food and drinks, and some networking will round off the evening.
The venue is the Benjamin Franklin Suite, The RSA near Charing Cross Station: (http://www.thersa.org/contact)
Schedule: Wednesday 22nd September
4:00pm: Registration and refreshments:
4:30pm: Talk 1 - Overview of Apache Hadoop & MapReduce Programming
Apache Hadoop has rapidly become the platform of choice for Data-Intensive Supercomputing. At Yahoo!, Hadoop runs on more than 38,000 servers, stores more than 170 PetaBytes of data, and performs millions of big data analytics computations every month. In this talk, we describe the two core components of Hadoop: Distributed File System, and the MapReduce programming framework. We will learn how to program using Hadoop MapReduce framework, with numerous examples from real-world usage of Hadoop at Yahoo!.
6:00pm: Break with refreshments
6:30pm: Talk 2 - Introduction to Pig Programming
Apache Pig is a parallel dataflow system that uses Hadoop as it's backend distributed computation platform. More than 75% of Hadoop jobs at Yahoo! are invoked with Pig. In this talk, we introduce the dataflow language, Pig Latin. We will learn about the simplicity, flexibility, and configurability of Pig. We will describe the Pig architecture, and how Pig dataflow programs are executed using Hadoop MapReduce platform, with real examples.
8:00pm: Food and Drinks
The event is free. Please only register if you are coming as we have a limited amount of space
The Yahoo Developer Network
About Milind Bhandarkar
Dr. Milind Bhandarkar has been working with Hadoop and Pig since version 0.1.0 for both. He started the Yahoo! Grid Solutions team focused on training, consulting, and supporting thousands of new migrants to Hadoop and Pig. He has been focused on parallel programming languages and paradigms for over 20 years, and has a PhD from University of Illinois at Urbana-Champaign, USA in that field. He worked at the Center for Development of Advanced Computing (C-DAC), Center for Simulation of Advanced Rockets, Siebel Systems, and Pathscale before settling at Yahoo! in 2005. As Hadoop Solutions Architect at Yahoo!, Milind has enabled several mission-critical projects at Yahoo! adopt (and adapt to) Apache Hadoop.
When & Where
Yahoo! Developer Network
About The Yahoo! Developer Network
YDN’s mission is to inspire developers, accelerate Y! innovation and to be the daily go-to platform for developers globally, inside and outside of Yahoo!. We encourage developers to sample a buffet of the best technologies not only from Yahoo!, but from other major tool providers. This is to allow developers to innovate across multiple platforms and utilize a wide variety of technologies so together everyone can innovate and push the envelope of what's possible. Every month, more than 550,000 developers visit YDN to socialize, build and learn. Our community also hosts exciting large-scale events around the world, such as Open Hack (for all developers) and HackU's (for universities), where developers of all backgrounds are invited to collaborate through code and creativity in person. We look forward to having you join us!