Description

Data Science projects are not typical BI projects. Data Science projects start with a business problem or opportunity to be explored and result in gaining new insights as well as producing analytical models – meaning data science projects have different deliverables, pitfalls and challenges. Attempting to deliver a data science project with traditional BI project methodologies contributes to the high rate of data science project failures. CRISP-DM (Cross-industry standard process for data mining) is the accepted methodology for data science projects and addresses the success factors required for data science projects.

This hands-on workshop will expose Business Intelligence practitioners, data analysts, and those looking to get started in data science to an applied experience where they learn to identify the data science problem or opportunity, choose the modeling approach, select the correct features to model, evaluate the results and deploy the model. The workshop covers a wide range of data preparation and modeling exercises—from data sandbox construction to the creation of training, test, and validation data sets for model development.

We will provide a few data sets, jump-start workflows and final solutions for the exercises.

Please note that this workshop is NOT for data science developers or IT developers looking to write analytics programs in Java, Python, R or Scala. 

Why attend

You will learn to:

  • Understand a data science methodology and end-to-end workflow of problem solution including data understanding and preparation, model building and validation, and model deployment.
  • Match data science problems and opportunities to the best-fit models
  • Prepare data for different kind of models
  • Handle missing data, outliers and quirks
  • Train, test and validate data sets for model development
  • When to apply supervised or unsupervised machine learning models
  • Build prediction, classification and clustering models
  • Deploy machine learning models into the cloud

Who should attend

  • Analytics professionals, including business intelligence and data management professionals
  • Business analysts, data analysts and functional analysts
  • IT professionals and consultants
  • Analytics leaders
  • Program and project leaders
  • Anyone who aspires to become a data scientist

Prerequisites

You should have some coding experience and basic knowledge of statistics.

Code: DS2021
Price: 725 EUR

Inquire about this course

Outline

 
 
 
 
 
 
 
 
 
 

Data Science Overview

  • Workshop 1: What is data science and what is data science not?
  • Different types of data science problems or opportunities
  • From reporting to analytics to data science
  • Statistics, data mining and machine learning: the same but different
  • Workshop 2: why data science projects succeed or fail?
  • CRISP-DM/PA/DS overview
    • Business understanding
    • Data understanding
    • Data preparation
    • Modelling
    • Evaluation
    • Deployment
  • Data science toolbox
    • R & Phyton
    • Libraries
    • The big data ecosystem
    • Data science workbenches
  • Workshop 3: Data Science vs BI

Data Preparation for Data Science

  • Getting data ready for machine learning models
  • Tools, infrastructure & platforms
    • Hand coding vs ETL tools vs self-service data preparation tools
    • Introduction to R & Knime
  • Workshop 4: creating a sandbox for data science
  • Data Understanding
    • Data requirements
    • Explore data (with visualizations)
    • Assess data quality
  • Workshop 5: data exploration and visualisations
  • Data preperation techniques
    • Selecting
    • Cleansing: missing data, outliers and data quality issues
    • Data Construction - Feature Engineering
    • Data Integration
    • Formatting
    • Reviewing feature selection
  • Workshop 6: Data Engineering with Python and Knime
    • Create a data preperation pipeline in R
    • Create a data preperation pipeline in Knime
  • Data preparation considerations for choosing the correct machine learning model

Building Machine Learning Models

  • Introduction to machine learning model development
  • When to apply supervised or unsupervised learning
  • Statistical modeling
  • Choosing the right machine learning models
  • Common analytical and machine learning algorithms
    • Prediction
      • Linear regression
      • Logistic regression
      • Principal component analysis
      • Issues unique to prediction problems
    • Classification
      • Decision trees
      • Random forest
      • Neural networks
      • Issues unique toclassification problems
    • Clustering
      • K-means clustering
      • Issues unique to clustering problems
  • Workshop 7: Match machine learning problems or opportunities with the best-fit models
  • Model validation
    • Accuracy, precision, recall, F1-score
  • Workshop 8: Modelling and model evaluation with R or Knime 

Deployment: From Lab to Production

  • Planning for deployment
  • Building a reproducable machine learning pipeline
    • Serving the model via REST API
    • Running ML applications with containers
    • Deploying into the cloud
  • Model management
    • Regulatory and operational complexity
  • Continuous integration and deployment pipelines
  • Workshop 9: deploying a predictive model in Microsoft Azure

Instructor

David de Roos

David de Roos is a Senior Solution Architect in the area of Business Intelligence, (Big) Data Engineering and Data Science. He works on all aspects of Business Intelligence and Big Data Analytics projects, including data mining and predictive modelling. He experienced and implemented the change from simple analytical statistics to complex data science models. Next to designing data models, ETL processes and visualizing data, he implemented churn prediction, fraud detection and credit risk prediction models in Finance and Energy.

David obtained his Masters Degree in Sociology and Statistics at the Erasmus University Rotterdam. He co-authored articles about Political Preference Prediction and has experience in teaching Statistics. In addition to his regular work, he assists students who need statistical and methodological assistance when they graduate.

Dates

This course is only available as Customer Specific Training, whereby we can deliver private courses arranged at both a location (or virtual) and time to suit you, covering the right content to address your specific learning needs. Contact us by e-mail at info@q4k.com.

Pricing

The fee for this 1-day course is EUR 725 (+VAT) per person.

We offer the following discounts.

  • 10% discount for groups of 2 or more students from the same company registering at the same time.
  • 20% discount for groups of 4 or more students from the same company registering at the same time.

Note: Groups that register at a discounted rate must retain the minimum group size or the discount will be revoked. Discounts cannot be combined.

Copyright ©2023 quest for knowledge