Apache Accumulo Training (AAT)

Course Description Schedule Course Outline

Course Content

Accumulo is a distributed NoSQL database built on Hadoop. One of the attractive features of Accumulo is the built-in security model. Accumulo provides 'cell level access control'. This enables us to build Big Data applications where data access and security are critical. This 2 day course goes into the details of leveraging Accumulo to tightly define and achieve data analytic goals. The 2 days are very hands-on intensive.

Who should attend



Learners will need to come to class meeting the following prerequisites:

  • Comfortable with Java programming language (most programming exercises are in java)
  • Be able to navigate Linux command line
  • Basic knowledge of Linux editors (VI / nano) for modifying code

Course Objectives

By the end of this course, you should be able to:

  • Understand the fundamentals of Accumulo including Architecture, installation and setup
  • Perform writing and reading with Accumulo APIs incuding batch operations
  • Designing with patterns to achieve flexible schemas
  • Integrating Accumulo and Hadoop
  • Configuring for server-side optimizations
  • Working with cells and partitions
  • Statistical analysis with data retrieval patterns
  • Data science with Accumulo

Detailed Course Outline

Module 1: Introduction to Accumulo
  • NoSQL concepts
  • Other NoSQL datastores
  • What is special about Accumulo: design goals and implementation
Module 2: Installation and quick start
  • Environment pre-requisits
  • Accumulo configuration
  • Process control scripts
  • Shell and monitoring tools
  • Lab
Module 3: Accumulo architecture
  • Key/Value spaces
  • Range scans and filtering
  • Tables and tablets
  • Internal Accumulo communication
  • Anatomy of reads and writes
Module 4: Writing and reading with API
  • Rows keys, row values
  • Mutations
  • Instances and connectors
  • Batch operations: Scanner, BatchWriter, BatchScanner
  • Lab
Module 5: Accumulo design patterns
  • How to present your design
  • Flexible schemas
  • Use of indexing
  • Single-entity tables
  • Unique keys
  • Design lab
  • Time series data
  • Use of denormalization
  • Joins and pre-joins
  • Indices
  • Teams lab
Module 6: Hadoop integration
  • Using Accumulo with Hadoop and other Hadoop echosystem tools
  • Imitating relational operations
  • Client-side iterators
  • Lab
Module 7: Server-side optimizations
  • Iterators
  • Constraints
  • Initial load (bulk load)
  • Lab
Module 8: Cells and partitions
  • Domain-specific autorization
  • Wide vs tall
  • Reasoning about locality
Module 9: Data retrieval patterns
  • Statistics
  • Query time optimization
  • Partitioned joins
Module 10: Data science with Accumulo, conclusion
  • Graph search
  • Machine learning
  • Geo information
  • Administration and performance optimization
Classroom Training

Duration 2 days

  • United States: US$ 1,750
Enroll now
Online Training

Duration 2 days

  • United States: US$ 1,750
Enroll now