San Francisco, California
London, United Kingdom
This two-day course provides an intensive introduction to HBase for developers. Students will learn how to use the HBase shell and Java APIs, how to design, create, and manage tables, effective schema design, performance and management considerations, and other topics.
The course will be taught using a Linux environment. Most HBase programming is doing using Java APIs. Therefore, the students for this class need to meet the following prerequisites:
- Programming experience: The ability to use a programmer’s text editor or IDE to edit code is required.
- Java experience: The ability to write and build Java applications is required.
- Linux shell experience: The ability to log into Linux machines and use basic Linux shell (bash) commands is required.
What You Must Bring
Bring your laptop with the following software installed in advance.
- Putty (Windows only): Students will log into a cluster for the course. Mac OSX and Linux environments include ssh (secure shell) support. Windows users will need to install putty.zip file from here.
- Editor: A programmer’s source code editor (e.g., Eclipse, WordPad, NotePad++, but not Notepad). Eclipse is strongly recommended for the Java-based exercises.
What You Will Learn
Think Big Academy courses teach by doing, where short lectures and hands-on exercises are interspersed. By the end of the course, you will learn the following:
- How HBase and other NoSQL databases differ from traditional relational databases
- Using the HBase shell to check system status, manipulate tables, and access data
- Writing Java applications to access data in HBase
- Writing MapReduce jobs which read and write HBase data
- Efficient and effective schema design
- Using Avro to store complex data types
- Understanding backup and replication options
- Basic system configuration and tuning
The particular agenda for each day may be adjusted according to student interests, pace, and other considerations.
- Course Overview
- Big Data Overview
- Introduction to Hadoop, MapReduce, and HDFS
- Introduction to HBase
- HBase Components
- HBase Data Model
- Create, Update, and Delete (“CRUD”) in HBase
- Scaling HBase Operations
- Tool Overview
- HBase Shell
- Exercise: Using the HBase Shell
- Web Interface
- Programmatic Interface (API)
HBase Java APIs
- API Overview
- Administrative Functions via HBaseAdmin
- Exercise: Creating and Deleting Tables
- Access HBase with HTable and HTablePool
- Writing Data with Put
- Exercise: Populating an HBase table
- Reading Data
- Exercise: HBase Get and Scan
- Deletes and Tombstone Markers
- Exercise: Deleting Data
- Using HBase with MapReduce
- Exercise: Loading Data via a MapReduce Job
- Exercise: Computing Statistics in a MapReduce Job
- Review of Relational Data Modeling
- Introduction to NoSQL Data Models
- Row Key Selection Considerations
- Exercise: Schema Design
- Advanced Schema Design
- Storing Complex Data Types with Avro
- Exercise: MapReduce Stock Data Loader with Avro
- Exercise: Advanced Schema Design
- Secondary Indexing
HBase Tuning and Administration
- Introduction to Disaster Recovery, RPO and RTO
- HBase Backup and Replication Options
- HBase Server Configuration Basics
- Java VM Garbage Collection Tuning
- Monitoring HBase