Data Warehouse Modernization

Outline

The Traditional Data Warehouse and Why It Needs Modernizing

For many organizations today, their data warehouse is still based on a waterfall style architecture with data flowing from source systems into operational data stores, staging areas, then on to data warehouses under the management of batch ETL jobs. However, the analytical landscape has changed. New data sources continue to grow with data now being collected in edge devices, cloud storage, cloud or on-premises NoSQL data stores, Hadoop systems as well as data warehouse staging. Cloud Storage, Spark, streaming data platforms and Graph databases are also now used in data analysis. Also, many business units are using the cloud to quickly exploit these new analytical technologies at lower costs.

This module looks at these new activities and explains why data warehouses have to change not only to speed up development, improve agility and reduce costs but also to exploit new data, enable self-service data preparation, utilize advanced analytics and integrate with these other analytical platforms.

The traditional data warehouse
Multiple data stores, waterfall data architecture and data flows
New data entering the enterprise
The changing face of analytics – new analytical data stores and platforms

Big Data analytics on Spark, cloud storage and Hadoop
Real-time streaming data analytics
Graph analysis in Graph Databases

New challenges brought about by:

Data complexity
Data management siloes
Managing data in a distributed and hybrid computing environment
Self-service data prep vs ETL/DQ

Problems with existing data warehouse architecture and development techniques
The need to avoid silos, accommodate new data, accommodate change quickly and integrate analytical workloads to deliver value

Modern Data Warehouse Requirements

This module looks at the key building blocks of modern data warehouses that need to be in place for flexibility and agility.

Modern data modelling techniques
Accelerating ETL processing using automated data discovery, a data catalog DataOps pipelines and re-usable data products
Cloud based analytical DBMS
External tables, Lakehouses and in-database analytics
Shortening development time using Data warehouse automation
Data virtualization for data independence, flexibility and to integrate new analytical data stores into a logical data warehouse
Incorporating fast streaming data, prescriptive analytics, embedded and operational BI

Modern Data Modelling Techniques for Agile Data Warehousing

In order to improve agility, change-friendly data modelling techniques have emerged and are becoming increasingly popular in designing modern data warehouses. This module looks at data modelling and asks: Is the star schema dead? Which data warehouse modelling technique is best suited to handling change? Should you use Data Vault? Does data warehouse design need to change? Does data mart design need to change? It also looks at the disadvantages of such techniques and how you can overcome these.

Data warehouse modelling approaches - Inmon vs Kimball vs Data Vault
The need to handle change easily
What is Data Vault?
Data vault modelling components – hubs, links and satellites
Pros and cons of data modelling techniques
Using data virtualization to improve agility in data marts while reducing cost

Modernizing Your ETL Processing

This module looks at the challenges posed by new data on ETL processing. Also, what options are available to modernise ETL processing, where should it run and what are the pros and cons of each option? How does this impact your data architecture?

New data and ETL processing
- High volume data, semi-structured data, unstructured data, streaming data e.g. IoT data
What are the implications and challenges of this new data on ETL processing?
Should all this data go into a data warehouse or not?
What options are available to modernize data warehouse ETL processing?
- Data Lake Vs Lakehouse Vs Data Mesh
- Offloading staging data to a data lake and use Spark for scalable ELT processing
- Using data warehouse automation software to generate ETL processing
Pros and cons of these options
Data architecture implications of modernizing and democratising data engineering

Accelerating ETL Processing Using a Data Catalog, Data Fabric, Data Products and a Data Marketplace

This module looks at how you can use a multi-purpose Data Lake, Lakehouse and Data Mesh to accelerate ETL processing and integration of data for your data warehouse.

How can you accelerate ETL processing and self-service data preparation?
Decentralizing data engineering
Using Data Fabric to build DataOps pipelines to create reusable data products in a Data Mesh
Options for implementing Data Mesh - data lakes, lakehouses, cloud data platforms, Kafka and data virtualization
Directly accessing data sources vs ingesting and staging your data in readiness for ELT processing
Using a data catalog to automatically discover, classify, data quality profile, catalog and map data to a business glossary
GDPR - Detecting sensitive data during automatic data discovery
Masking GDPR sensitive data during ingestion or ETL pipeline execution
Using machine learning in DataOps ELT pipelines to process unstructured data
Pipeline processing streaming data in a real-time data warehouse
- Types of streaming data - IoT data, weblogs, OLTP system change data capture, …
- Key technologies for processing streaming data – Kafka, streaming analytics and event stores
- Turning OLTP change data capture into Kafka data streams
- Using Kafka as a data source to process data in real-time
- Running ETL processing at the edge vs on the cloud or the data centre
- Future-proofing streaming ETL processing using Apache Beam
- Ingesting streaming data into your data lakehouse or data warehouse
Real-time data warehouse - integrating your data warehouse with streaming data – external tables, data virtualization and data lake or using a lakehouse
Using ETL data pipelines to produce re-usable data products for use in your data warehouse, data science and other analytical data stores
Publishing reusable data products in a data marketplace ready for consumption
Using Data Science to develop new analytical models to run in your data warehouse

Rapid Data Warehouse Development Using Data Warehouse Automation

In addition to modernizing ETL processing, this session looks at how you can use metadata driven data warehouse automation tools to rapidly build, change and extend modern cloud and on-premises data warehouses and data marts. It looks at how these tools help you adopt new modern data modelling techniques quickly, how they generate schemas and data integration jobs and how they can help you migrate to your new data warehouse systems on the cloud.

What is data warehouse automation?
Using data warehouse automation tools for rapid data warehouse and data mart development
- Data pipeline generation
- Processing streaming data using data warehouse automation
- Integrating big data with a data warehouse using data warehouse automation
- Integrating cloud data warehouses with data lakes using data warehouse automation
- Integrating business glossaries with data warehouse automation tools
- Using data warehouse automation to migrate data warehouses
- Using data virtualization to shield existing BI tools from changes in design
The data warehouse automation tools market, e.g. IDERA WhereScape, biGENIUS, Qlik Attunity Compose, TimeExtender, Varigence BIMLStudio, VaultBuilder & more
Metadata driven data warehouse maintenance

Building a Modern Data Warehouse in a Cloud Computing Environment

A key question for many organisations is what do you do with your existing data warehouse? Should you try to change the existing set-up to make it more modern or re-develop it in the cloud? This module looks at the advantages of building modern data warehouses in a cloud computing environment using a cloud based analytical Relational DBMS.

Why use cloud computing for your data warehouse?
Cloud-based data warehouse development – what are the options?
Cloud-based analytical relational DBMSs
Amazon Redshift, Google BigQuery, IBM Db2 Warehouse on Cloud, Microsoft Azure Synapse Analytics, Oracle Autonomous Data Warehouse, Snowflake, SAP Data Warehouse Cloud, Teradata, Kinetica,
Lakehouses – Apache Iceberg, Databricks, HPE Ezmeral
Separating storage from compute for elasticity and scalability
Managing and integrating cloud and on-premises data
Using iPaaS software to integrate data in cloud ETL processing – Informatica IICS, Boomi, SnapLogic, ...
Non-iPaaS Cloud ETL tools, e.g. AWS Glue, Azure Data Factory, Google Cloud Dataplex and Data Fusion, IBM Cloud Pak for Data, Talend, Software AG StreamSets
Managing streaming data in the cloud
Integrating big data analytics into a cloud-based data warehouse
Train and deploy machine learning models in your analytical database for in-warehouse analytics
Tools and techniques for migrating an existing data warehouse to the cloud
Migrating DW schema, data, ETL and security
Dealing with cloud DW migration issues like data types, SQL differences, privilege differences, data volumes
Managing access to cloud-based data warehouses
Integrating cloud-based BI tools with on-premise systems

Simplifying Data Access – Creating Virtual Data Marts and a Logical Data Warehouse Architecture to Integrate Big Data With Your Data Warehouse

This module looks at how you can make use of data virtualization software to modernize your data warehouse architecture. It also looks at how to simplify access to and integrate data in your data warehouse and big data underlying data stores and improve agility.

What is data virtualization?
How does data virtualization work?
How can data virtualization reduce the cost of ownership, improve agility and modernize your data warehouse architecture?
Simplifying your architecture by using data virtualization to create Virtual Data Marts
Migrating your physical data marts to virtual data marts to reduce the cost of ownership
Layering virtual tables on top of virtual marts to simplify business user access
Publishing virtual views and queries as services in a data catalog for consumption
Integrating your data warehouse with your data lake and low latency data using external tables and data virtualization
Enabling rapid change management using data virtualization
Creating a logical data warehouse architecture that integrates data from big data platforms, graph databases, streaming data platforms and your data warehouse into a common access layer for easy access by BI tools and applications
Using a business glossary and data virtualization to create a common semantic layer with consistent common understanding across all BI tools

Getting Started With Data Warehouse Modernization

This final module looks at what you have to do to get started with a data warehouse modernization initiative. In particular, it looks at:

Data Warehouse Modernization options

Change vs rebuild?

How do you minimize the impact on the business while you modernize?
How do you deal with a backlog of change when you are also trying to modernize?
Pros and cons of build vs automating data warehouse development
What new skills are needed?
Delivering new business value whilst in the progress of modernizing
How do you involve business professionals in the modernization effort?

Instructor

Mike Ferguson

Mike Ferguson is the Managing Director of Intelligent Business Strategies Limited. As an independent IT industry analyst and consultant, he specializes in BI/Analytics and data management. With over 40 years of IT experience, Mike has consulted for dozens of companies on BI/analytics, data strategy, technology selection, data architecture and data management.

Mike is also conference chairman of Big Data LDN, the fastest-growing data and analytics conference in Europe and a member of the EDM Council CDMC Executive Advisory Board. He has spoken at events all over the world and written numerous articles.

Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS.

He teaches popular master classes in Data Warehouse Modernization, Big Data Architecture & Technology, How to Govern Data Across a Distributed Data Landscape, Practical Guidelines for Implementing a Data Mesh (Data Catalog, Data Fabric, Data Products, Data Marketplace), Real-Time Analytics, Embedded Analytics, Intelligent Apps & AI Automation, Migrating your Data Warehouse to the Cloud, Modern Data Architecture and Data Virtualisation & the Logical Data Warehouse.

Dates

This course is only available as Customer Specific Training, whereby we can deliver private courses arranged at both a location (or virtual) and time to suit you, covering the right content to address your specific learning needs. Contact us by e-mail at info@q4k.com.