Advanced Hadoop for Developers (HD-ADV)

Course Description Schedule Course Outline
 

Course Content

Advanced Hadoop for Developers is a robust hands-on, three-day course that teaches you advanced programming techniques in Pig, Hive, HDFS, and HBase. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. This course follows Hadoop for Developers (HD) and continues your deep dive into Hadoop development by focusing on advanced data management in Hadoop Distributed File System (HDFS), complex structured data in Pig, performance tuning in Hive, and advanced HBase techniques.

Who should attend

Big Data developers and engineers.

Prerequisites

  • Comfortable with Java programming language as most programming exercises are in java
  • Comfortable navigating the Linux command line and editing files using vi or nano
  • Attended Hadoop for Developers (HD) or have similar knowledge

Detailed Course Outline

Module 1: Data Management in HDFS

  • Various data formats (JSON / Avro / Parquet)
  • Compression schemes
  • Data masking
  • Labs

Module 2: Advanced Pig

  • User-defined functions
  • Introduction to Pig libraries (ElephantBird / Data-Fu)
  • Loading complex structured data using Pig
  • Pig Tuning
  • Labs

Module 3: Advanced Hive

  • User-defined functions
  • Compressed tables
  • Hive performance tuning
  • Labs

Module 4: Advanced HBase

  • Advanced schema modelling
  • Compression
  • Bulk data ingest
  • Wide-table / tall-table comparison
  • HBase and Pig
  • HBase and Hive
  • HBase performance tuning
  • Labs
Classroom Training

Duration 3 days

Price
  • United States: US$ 2,500
Enroll now
Online Training

Duration 3 days

Price
  • United States: US$ 2,500
Enroll now