" rel="stylesheet">
Skip Main Navigation
Page Content

Looks like this event has already ended.

Check out upcoming events by this organizer, or organize your very own event.

View upcoming events Create an event

Getting Started with RHadoop

Think Big Analytics

Monday, February 25, 2013 from 9:00 AM to 4:00 PM (PST)

Getting Started with RHadoop

Ticket Information

Ticket Type Sales End Price Fee Quantity
Full Program Ended $795.00 $0.00

Who's Going

Loading your connections...

Share Getting Started with RHadoop

Event Details

Getting Started with RHadoop is designed for data analysts skilled with R to learn to leverage their existing knowledge in the Big Data environment of Hadooop.

The class is taught by Jeffrey Breen, Principal at Think Big Analytics, and consists of a mixture of lecture and hands-on examples and exercises. Previous knowledge of Hadoop is not necessary, but you should be comfortable using R interactively from a command shell in addition to a GUI. We will work all examples on a Hadoop cluster (provided).


Overview & Introduction

  • R & Hadoop
  • RHadoop Package Overview
  • RHadoop Advantages
  • rmr2 Advantages
  • Installation and configuration
  • RHadoop Prerequisites
  • Downloading RHadoop

Using RHadoop


  • Installation
  • Function Overview

Example: Populate HDFS

Example: Checking the results


  • Installation
  • Installation test: Example 0
  • Function Overview

Example: wordcount: the “hello world” of Hadoop

  • code
  • mapper
  • reducer
  • combiner
  • submit job and fetch results
  • Exercise: get rid of the punctuation

Example: airports

  • about the data
  • selecting meaningful keys and values
  • writing an input formatter
  • mapper
  • reducer
  • submit job and fetch results
  • execution notes
  • Exercise: repeat analysis by year and airline

Example: user-based collaborative filtering

  • about the data
  • the algorithm
  • step 1: input formatter
  • step 1: mapper
  • step 1: reducer
  • step 2: reducer
  • submit the jobs and fetch results
  • Exercise: find nearest neighbors


  • Installation
  • Installation test: Example 0
  • Function Overview

Example: tweets

write Twitter status messages to HBase

Example: Twitter users

store user information tweet authors

Wrap-up and Q&A

Have questions about Getting Started with RHadoop? Contact Think Big Analytics

When & Where

Revolution Analytics
101 University Avenue
Palo Alto, CA 94301

Monday, February 25, 2013 from 9:00 AM to 4:00 PM (PST)

  Add to my calendar
Getting Started with RHadoop
Palo Alto, CA Events Class

Please log in or sign up

In order to purchase these tickets in installments, you'll need an Eventbrite account. Log in or sign up for a free account to continue.