Hadoop for Developers (HD)

Course Description Schedule Course Outline

About this Course

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. Fast Lane’s Hadoop for Developers is a deep, hands-on immersion into essential Hadoop concepts and skills. The course covers all the basic required skills for Hadoop developers including horizontal scaling with the Hadoop Distributed File System (HDFS), working with the MapReduce programming model, leveraging Pig and Pig Latin, implementing Apache Hive, and the architecture and capabilities of HBase.

Who should attend

Developers who want to learn about Hadoop and Big Data

Class Prerequisites

Learners will need to come to class meeting the following prerequisites:

  • Comfortable with Java programming language (most programming exercises are in java)
  • Be able to navigate Linux command line
  • Basic knowledge of IDEs like Eclipse or Linux editors (VI / nano) for modifying code

What You Will Learn

By the end of this course, you should be able to:

  • Understand the history, concepts and high-level architecture of Hadoop
  • Develop data architectures using the Hadoop Distributed File System (HDFS)
  • Program using MapReduce for various use cases
  • Use Pig Latin to define job flows and implement MapReduce
  • Compare and contrast Hive and SQL for Big Data implementations
  • Design schemas using HBase

Outline: Hadoop for Developers (HD)

Module 1: Introduction to Hadoop
  • Hadoop history, concepts,
  • Eco system and distributions
  • High level architecture
  • Hadoop myths & challenges
  • Hardware / software
Module 2: HDFS
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Architecture
  • Namenodes and Data node
  • Communications / heart-beats
  • Block manager / balancer
  • Health check / safemode
  • Read / write path
  • File systems abstractions
  • Data integrity
  • Namenode HA, Federation
  • Lab exercises
Module 3 : Map Reduce
  • Mapreduce concepts
  • Daemons : jobtracker / tasktracker
  • Phases : driver, mapper, shuffle/sort, reducer
  • Counters, combiners
  • Distributed cache
  • Mapreduce configuration
  • MR types and formats
  • Sorting and Joins
  • Job schedulers & unit testing
  • Thinking in map reduce
  • Future of mapreduce (yarn)
  • Lab exercises
Module 4 : Pig
  • Pig vs java map reduce
  • Pig job flow
  • Pig latin language
  • Lab exercises
Module 5: Hive
  • Hive concepts
  • Architecture
  • Data types
  • Hive vs sql
  • Lab exercises
Module 6: HBase
  • Intro
  • Concepts
  • Architecture
  • Hbase vs RDBMS
  • Read path / write path
  • Schema design
  • Lab exercises
Classroom Training

Duration 4 days

  • United States: US$ 3,250
Enroll now
Online Training

Duration 4 days

  • United States: US$ 3,250
Enroll now