$2,824.95

Multiple Dates

HDP Developer: Quick Start - Hortonworks Official Curriculum

Event Information

Share this event

Date and Time

Location

Location

Hong Kong

Hong Kong

View Map

Event description

Description

COURSE OVERVIEW

This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.

Topics include: Essential understanding of HDP and its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features.


COURSE CONTENT

DAY 1: AN INTRODUCTION TO APACHE HADOOP AND HDFS

OBJECTIVES

  • The Case for Hadoop

  • The Hadoop Ecosystem

  • The HDFS Architecture

  • Ingesting Data Into HDFS

  • Parallel Processing Fundamentals

  • YARN Architecture

  • Introduction to Apache Pig

LABS

  • Starting anHDP Cluster

  • Using HDFS Commands

  • Demonstration: Understanding Apache Pig

  • Getting Started with Apache Pig

  • Exploring Data with Pig

DAY 2: ADVANCED APACHE PIG PROGRAMMING

OBJECTIVES

  • Advanced Apache Pig Programming

  • Introduction to Apache Hive

  • Using HCatalog

LABS

  • Splitting a Dataset

  • Joining Datasets

  • Preparing Data for Apache Hive

  • Understanding Apache Hive Tables

  • Demonstration: Understanding Partitions and Skew

  • Analyzing Big Data with Apache Hive

  • Demonstration: Computing Ngrams

  • Joining Datasets in Apache Hive

  • Computing NGrams of Emails in Avro Format

  • Using HCatalog with Apache Pig

DAY 3: ADVANCED APACHE HIVE PROGRAMMING

OBJECTIVES

  • Advanced Apache Hive Programming

  • An Overview of Apache Zeppelin and Apache Spark

  • An Introduction to RDD Programming

  • An Introduction to Pair RDDs

LABS

  • Advanced Apache Hive Programming

  • Introduction to Apache Spark REPLs and Apache Zeppelin

  • Creating and Manipulating RDDs

  • Creating and Manipulating Pair RDDs

DAY 4: WORKING WITH PAIR RDDS AND BUILDING YARN APPLICATIONS

OBJECTIVES

  • An Introduction to Pair RDDs (Continued)

  • An Introduction to Spark SQL

  • Caching and Persisting

  • Building and Submitting Applications to YARN


LABS

  • Creating and Saving DateFrames and Tables
  • Working with DataFrames
  • Building and Submitting Applications to YARN


Share with friends

Location

Hong Kong

Hong Kong

View Map

Save This Event

Event Saved