Hadoop Class – Advanced & Performance Tuning
Hadoop jobs that are lengthy or periodic can benefit from
tuning, but the space of possibilities is very large. The primary focus is on how tuning a
distributed program differs from that of standalone programming. Tuning parameters at the JVM, cluster, data,
and application levels are described along with measurement techniques,
extensions, and heuristics for converging on a reasonable result.
Paul Baclace worked with Hadoop creator Doug Cutting at the Internet Archive (archive.org) to use Hadoop and Nutch to build an index to 4.6TB of web pages for the US National Archives in 2005. Paul has contributed patches to Nutch and Hadoop, and more recently consulted to AT&T Interactive and several startups to take advantage of big data processing in the cloud and on in-house servers. Paul has over 20 years experience and a B.S. in C.S. from the rigorous Rensselaer Polytechnic Institute, Troy, NY.
Each student will be allocated a temporary cluster for performing tuning experiments.
· Should have previously run Hadoop jobs that were at least modifications of example programs in Java
· Basic Linux command line skills.
· Laptop with WiFi for connecting to lab systems.
Developers with Linux command line experience.
For further information, please contact:
Big Data classes offered “group study “style, with dedicated clusters in hands-on labs, conducted by industry experienced gurus, with a “from-the-trenches’ approach.
For full details, please visit the site.