HDP Developer: Apache Pig and Hive - Hortonworks Official Curriculum
Event Information
Description
COURE OVERVIEW
This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
COURSE CONTENT
DAY 1: AN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM
OBJECTIVES
- Understanding Hadoop
- The Hadoop Distributed File System
- Ingesting Data into HDFS
- The MapReduce Framework
LABS
-
Starting an HDP Cluster
-
Demonstration: Understanding Block Storage
-
Using HDFS Commands
-
Importing RDBMS Data into HDFS
-
Exporting HDFS Data to an RDBMS
-
Importing Log Data into HDFS Using Flume
-
Demonstration: Understanding MapReduce
-
Running a MapReduce Job
DAY 2: AN INTRODUCTION TO APACHE PIG
OBJECTIVES
-
Introduction to Apache Pig
-
Advanced Apache Pig Programming
LABS
-
Demonstration: Understanding Apache Pig
-
Getting Starting with Apache Pig
-
Exploring Data with Apache Pig
-
Splitting a Dataset
-
Joining Datasets with Apache Pig
-
Preparing Data for Apache Hive
-
Demonstration: Computing Page Rank
-
Analyzing Clickstream Data
-
Analyzing Stock Market Data Using Quantiles
DAY 3: AN INTRODUCTION TO APACHE HIVE
OBJECTIVES
-
Apache Hive Programming
-
Using HCatalog
-
Advanced Apache Hive Programming
LABS
-
Understanding Hive Tables
-
Understanding Partition and Skew
-
Analyzing Big Data with Apache Hive
-
Demonstration: Computing NGrams
-
Joining Datasets in Apache Hive
-
Computing NGrams of Emails in Avro Format
-
Using HCatalog withApachePig
DAY 4: WORKING WITH SPARK CORE, SPARK SQL AND OOZIE
OBJECTIVES
-
Advanced Apache Hive Programming (Continued)
-
Hadoop 2 and YARN
-
Introduction to Spark Core and Spark SQL
-
Defining Workflow with Oozie
LABS
-
Advanced Apache Hive Programming
-
Running a YARN Application
-
Getting Started with Apache Spark
-
Exploring Apache Spark SQL
-
Defining an Apache Oozie Workflow