Cloudera Search Training (CST)

Course Description Schedule Course Outline

Cloudera Search Training is now available in OnDemand e-learning.

$1815.00 USD

Click here for more information.

About this Course

Cloudera University’s three-day Search training course is for developers and data engineers who want to index data in Hadoop for more powerful real-time queries. You will learn to get more value from their data by integrating Cloudera Search with external applications. Through instructor-led discussion and interactive, hands-on exercises, you will learn to navigate the Hadoop ecosystem.

Who should attend

  • Developers
  • Data Engineers

Class Prerequisites

  • Basic familiarity with Hadoop
  • Experience programming in a general-purpose language such as Java, C, C++, Perl or Python.
  • Should be comfortable with the Linux command line
  • No prior experience with Apache Solr or Cloudera Search is required

What You Will Learn

By the end of this course, you will be able to:

  • Perform batch indexing of data stored in HDFS and HBase
  • Perform indexing of streaming data in near-real-time with Flume
  • Index content in multiple languages and file formats
  • Process and transform incoming data with Morphlines
  • Create a user interface for your index using Hue
  • Integrate Cloudera Search with external applications
  • Improve the Search experience using features such as faceting, highlighting, spelling correction

Outline: Cloudera Search Training (CST)

Module 1: Overview of Cloudera Search

  • What is Cloudera Search?
  • Helpful Features
  • Use Cases
  • Basic Architecture

Module 2: Performing Basic Queries

  • Executing a Query in the Admin UI
  • Basic Syntax
  • Techniques for Approximate Matching
  • Controlling Output

Module 3: Writing More Powerful Queries

  • Relevancy and Filters
  • Query Parsers
  • Functions
  • Geospatial Search
  • Faceting

Module 4: Preparing to Index Documents

  • Overview of the Indexing Process
  • Understanding Morphlines
  • Generating Configuration Files
  • Schema Design
  • Collection Management

Module 5: Batch Indexing HDFS Data with MapReduce

  • Overview of the HDFS Batch Indexing Process
  • Using the MapReduce Indexing Tool
  • Testing and Troubleshooting

Module 6: Near-Real-Time Indexing with Flume

  • Overview of the Near-Real-Time Indexing Process
  • Introduction to Apache Flume
  • How to Perform Near-Real-Time Indexing with Flume
  • Testing and Troubleshooting

Module 7: Indexing HBase Data with Lily

  • What is Apache HBase?
  • Batch Indexing for HBase
  • Indexing HBase Tables in Near-Real-Time

Module 8: Indexing Data in Other Languages and Formats

  • Field Types and Analyzer Chains
  • Word Stemming, Character Mapping, and Language Support
  • Schema and Analysis Support in the Admin UI
  • Metadata and Content Extraction with Apache Tika
  • Indexing Binary File Types with SolrCell

Module 9: Improving Search Quality and Performance

  • Delivering Relevant Results
  • Helping Users Find Information
  • Query Performance and Troubleshooting

Module 10: Building User Interfaces for Search

  • Search UI Overview
  • Building a User Interface with Hue
  • Integrating Search into Custom Applications

Module 11: Considerations for Deployment

  • Planning for Deployment
  • Determining Hardware Needs
  • Security Overview
  • Collection Aliasing
Classroom Training

Duration 3 days

  • United States: US$ 2,595
Enroll now
Online Training

Duration 3 days

  • United States: US$ 2,595
Enroll now
  • United States: US$ 1,815
Buy E-Learning