$243.90 – $313.75

Apache Spark Developer Training

Event Information

Share this event

Date and Time

Location

Location

Agilitcs Pte. Ldt. Noida

G-284 sector 63

sector 63

Noida, Uttar Pradesh 201301

India

View Map

Friends Who Are Going
Event description

Description

Overview

This four day course of Spark Developer is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Objectives

After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs

Pre requisitie :

  • Required

Basic to intermediate Linux knowledge, including: The ability to use a text editor, such as vi

Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

Knowledge of application development principles

Recommended

Knowledge of functional programming

Knowledge of Scala or Python

Beginner fluency with SQL

Course Overview

Lesson 1 – Introduction to Apache Spark

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect Data in Apache Spark
  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Lesson 3 – Build a Simple Apache Spark Application

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application
Lesson 4 – Work with PairRDD
  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes


Lesson 5 – Work with DataFrames
  • Create DataFrames

From existing RDD

From data sources

  • Work with data in DataFrames

Use DataFrame operations

Use SQL

Explore data in DataFrames

  • Create user-defined functions (UDF)

UDF used with Scala DSL

UDF used with SQL

Create and use user-defined functions

  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Lesson 6 – Monitor Apache Spark Applications
  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web UI

Lesson 7 – Introduction to Apache Spark Data Pipelines

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application
Share with friends

Date and Time

Location

Agilitcs Pte. Ldt. Noida

G-284 sector 63

sector 63

Noida, Uttar Pradesh 201301

India

View Map

Save This Event

Event Saved