Course Overview
This 13.5-hour course is for users who want to attain operational intelligence level 4, (business insights) and covers implementing analytics and data science projects using Splunk's statistics, machine learning, built-in and custom visualization capabilities.
Please note that this course may run over three days, with 4.5 hour sessions each day.
Prerequisites
To be successful, students should have a solid understanding of the following courses:
- Intro to Splunk
 - Using Fields (SUF)
 - Scheduling Reports & Alerts
 - Visualizations
 - Working with Time (WWT)
 - Statistical Processing (SSP)
 - Comparing Values (SCV)
 - Result Modification (SRM)
 - Leveraging Lookups and Subsearches (LLS)
 - Correlation Analysis (SCLAS)
 - Search Under the Hood
 - Intro to Knowledge Objects
 - Creating Field Extractions (CFE)
 - Search Optimization (SSO)
 - Exploring and Analyzing Data with Splunk (EADS)
 
Course Objectives
- Analytics Framework
 - Regression for Prediction
 - Cleaning and Preprocessing Data
 - Algorithms, Preprocessing and Feature Extrac7on
 - Clustering Data
 - Detecting Anomalies
 - Forecasting
 - Classification
 
Outline: Splunk for Analytics and Data Science (SADS)
Topic 1 – Analytics Workflow
- Define terms related to analytics and data science
 - Describe the analytics workflow
 - Describe common usage scenarios
 - Navigate Splunk Machine Learning Toolkit
 
Topic 2 – Training and Testing Models
- Split data for tes7ng and training using the sample command
 - Describe the fit and apply commands
 - Use the score command to evaluate models
 
Topic 3 – Regression: Predict Numerical Values
- Differentiate predictions from estimates
 - Identify prediction algorithms and assumptions
 - Model numeric predictions in the MLTK and Splunk Enterprise
 
Topic 4 – Clean and Preprocess the Data
- Define preprocessing and describe its purpose
 - Describe algorithms that preprocess data for use in models
 - Use FieldSelector to choose relevant fields
 - Use PCA and ICA to reduce dimensionality
 - Normalize data with StandardScaler and RobustScaler
 - Preprocess text using Imputer, NPR, TF-IDF, and HashingVectorizer
 
Topic 5 – Clustering
- Define Clustering
 - Identify clustering methods, algorithms, and use cases
 - Use Smart Clustering Assistant to cluster data
 - Evaluate clusters using silhouette score
 - Validate cluster coherence
 - Describe clustering best practices
 
Topic 6 – Forecasting Fields
- Differentiate predictions from forecasts
 - Use the Smart Forecasting Assistant
 - Use the StateSpaceForecast algorithm
 - Forecast multivariate data
 - Account for periodicity in each time series
 
Topic 7 – Detect Anomalies
- Define anomaly detection and outliers
 - Identify anomaly detection use cases
 - Use Splunk Machine Learning Toolkit Smart Outlier Assistant
 - Detect anomalies using the Density Function algorithm
 - View results with the Distribution Plot visualization
 
Topic 8 – Classify: Predict Categorical Values
- Define key classification terms
 - Identify when to use different classification algorithms
 - Evaluate classifier tradeoffs
 - Evaluate results of multiple algorithms