Accelerating Data Engineering Pipelines (ADEP)

 

Course Overview

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Prerequisites

  • Intermediate knowledge of Python (list comprehension, objects)
  • Familiarity with pandas a plus
  • Introductory statistics (mean, median, mode)

Course Objectives

  • How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
  • How different file formats can be read and manipulated by hardware.
  • How to scale an ETL pipeline with multiple GPUs using NVTabular.
  • How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Follow On Courses

Outline: Accelerating Data Engineering Pipelines (ADEP)

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Data on the Hardware Level

  • Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:
    • Pandas
    • CuDF
    • Dask

ETL with NVTabular

  • Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.
    • Transform raw json into analysis-ready parquet files
    • Learn how to quickly add features to a dataset, such as Categorify and Lambda operators

Data Visualization

  • Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.
  • Learn how to use descriptive statistics and plots like histograms in order to assess data quality
  • Learn effective memory usage, so users can quickly filter data through a graphical interface

Final Project: Data Detective

  • Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code

Final Review

  • Review key learnings and answer questions.
  • Complete the assessment and earn your certificate.
  • Complete the workshop survey.
  • Learn how to set up your own AI application development environment.

Prices & Delivery methods

Online Training

Duration
1 day

Price
  • US$ 500
Classroom Training

Duration
1 day

Price
  • United States: US$ 500

Schedule

Currently there are no training dates scheduled for this course.