Sales Ended

Apache Spark Developer Training

Event Information

Share this event

Date and Time

Location

Location

Agilitcs Pte. Ldt. Noida

G-284 sector 63

sector 63

Noida, Uttar Pradesh 201301

India

View Map

Event description

Description

Overview

This four day course of Spark Developer is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Objectives

After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs

Pre requisitie :

  • Required

Basic to intermediate Linux knowledge, including: The ability to use a text editor, such as vi

Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

Knowledge of application development principles

Recommended

Knowledge of functional programming

Knowledge of Scala or Python

Beginner fluency with SQL

Course Overview

Lesson 1 – Introduction to Apache Spark

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect Data in Apache Spark
  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Lesson 3 – Build a Simple Apache Spark Application

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application
Lesson 4 – Work with PairRDD
  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes


Lesson 5 – Work with DataFrames
  • Create DataFrames

From existing RDD

From data sources

  • Work with data in DataFrames

Use DataFrame operations

Use SQL

Explore data in DataFrames

  • Create user-defined functions (UDF)

UDF used with Scala DSL

UDF used with SQL

Create and use user-defined functions

  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Lesson 6 – Monitor Apache Spark Applications
  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web UI

Lesson 7 – Introduction to Apache Spark Data Pipelines

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application

Date and Time

Location

Agilitcs Pte. Ldt. Noida

G-284 sector 63

sector 63

Noida, Uttar Pradesh 201301

India

View Map

Save This Event

Event Saved