Data Catalogs
description.
This one-day class looks in detail at what a data catalog is and why it is important to implement one. We will look at:
- The data complexity challenges that companies are dealing with
- How data catalogs can be used to deal with this problem
- Why data catalogs are critical to making a unified approach to enterprise data governance possible
- How can data catalogs accelerate data engineering to produce ‘business ready’ reusable data products and publish them in a data marketplace for analytical use
- Why data catalogs are central to creating an enterprise knowledge graph and semantic layer for AI Agents
Why attend
Attendees will learn:
- How data catalogs work and what their capabilities are
- How to use data catalogs to discover, classify and catalog data in multiple data stores both on premises, across multiple clouds, and at the edge
- How to use data catalogs in an enterprise data governance program to systematically classify data and set policies to govern classified data and content across a distributed data estate from a single place. This includes automatic discovery and classification of sensitive data, governance of data access security, data privacy, data loss prevention, data sharing, data usage, data retention, and data quality
- How to use data catalogs to discover data that can be engineered in data integration pipelines to produce data products that can be published in a data marketplace
- How data catalogs can be used to create an enterprise knowledge graph / semantic layer with one or more domain-specific ontologies to provide context for AI agents
Who should attend
This course is intended for business and IT professionals responsible for data engineering, data product provisioning, enterprise data governance (including data access security, data privacy, data sharing, data usage, data retention, and data quality) of structured and unstructured data, and also for those responsible for AI who need to implement an enterprise ontology and semantic layer for AI Agents. This includes Chief Data Officers, Heads of AI, Citizen and professional IT Data Engineers, Data Architects, Data Scientists, Heads of Data Governance, Data Stewards, Solution Architects, and Enterprise Architects.
Prerequisites
This course assumes a basic understanding of data governance, data management, metadata, data warehousing, data cleansing, data integration, enterprise ontologies and providing context for AI agents.
outline.
Introduction to Data Catalogs
This module looks at typical existing setups and challenges that companies are facing to explain why data catalogs are needed.
- The ever-increasing complex distributed data landscape
- The growth in new data sources
- Disparate operational transaction systems
- The emergence of data mesh and its impact with respect to data engineering and data architecture
- The impact of ungoverned data
- Major requirements facing companies with respect to data
- What is a data catalog and why have one?
- What is a data catalog?
- Why have a data catalog? - data catalog use cases
- The data catalog software marketplace
- Core data catalog capabilities
The Importance of a Business Glossary
This session looks at the need to understand your data landscape from a business perspective. The key to making this happen is to establish a common business vocabulary in the business glossary of a data catalog to create common data names and definitions for your data. This enables you to understand the meaning of data, search for and govern data across your data estate from a business perspective and be able to use this business metadata to help create an enterprise semantic layer for AI agents.
- Data standardisation using a common business vocabulary
- The purpose of a common vocabulary in data governance and in semantic layers for AI
- Business glossary software – a data catalog capability
- Planning for a business glossary
- Glossary roles and responsibilities
- Glossary term submission, voting approval, and dispute resolution processes
- Approaches to creating a common vocabulary
- Organising data definitions in a business glossary
- Business glossaries, taxonomies, hierarchies
- The role of a data concept model in establishing semantic meaning
- Utilising a common vocabulary in data modelling, BI, AI, ETL, ESB, APIs, & MDM
Data Catalog Capabilities: the Importance of a Business Glossary
This module looks at the need to understand your data landscape from a business perspective. The key to making this happen is to establish a common business vocabulary in the business glossary of a data catalog to create common data names and definitions for your data. This enables you to search for and govern data across your data estate from a business perspective.
- Data standardization using a shared business vocabulary
- Business glossary software – now a capability of a data catalog
- The purpose of a common vocabulary in data governance
- Alation, Amazon Glue, Collibra, Informatica IDMC Business Glossary, IBM Watson Knowledge Catalog, Microsoft Azure Purview, Qlik (Talend) Business Glossary and Data Catalog, SAS Business Data Network, TopQuadrant TopBraid EDG Business Glossary
- Planning for a business glossary
- Glossary roles and responsibilities
- Glossary term submission, voting approval and dispute resolution processes
- Approaches to creating a common vocabulary
- Organizing data definitions in a business glossary
- The role of a data concept model
- Utilizing a common vocabulary in Data Modelling, ETL, BI, ESB, APIs, & MDM
Automated Data Discovery, Cataloguing and Business Glossary Mapping
Having defined your data, this session looks at discovering what data you have, where it is, and how it maps to your business glossary to provide a business understanding of the meaning of data in your data estate.
- The critical role of data catalog software in understanding your data estate
- The automated data discovery process
- Registering data sources for discovery
- Automated data discovery and data quality profiling using a data catalog
- Mapping data assets to a business glossary
Classifying Data and Content to Know How to Govern It
This session looks at manually and automatically labelling data using a data catalog to know how to govern it using predefined classifiers, user-defined classification schemes, and trainable classifiers. It then looks at how classified data shows up in a data catalog and how policies can be assigned to labelled data to govern it across your data estate.
- What is data classification?
- Automated sensitive data type detection and classification using pre-defined trained classifiers
- Creating your own data classification schemes for data confidentiality and retention
- Manually classifying content using your own classification scheme
- Training classifiers to automatically label unstructured content
- Using trained classifiers to auto-label unstructured content in the cloud and on-premises
- Using your own classification schemes to find data within a data catalog
- Automatically classifying sensitive structured data using a data catalog
- Using classification insights to understand sensitive data proliferation and data redundancy across your data estate
- Setting policies in a data catalog to govern data across your data estate
Data Governance and Stewardship
This session looks at what the requirements are to govern data in a modern enterprise and how the requirement can be met using a data catalog.
- Key requirements for governing data and content across a distributed data landscape
- What do you need to know to govern data?
- Introducing a data governance framework to help meet the challenge
- People – key roles, responsibilities and data governance operating model
- Core processes needed to establish and govern commonly understood data
- The role of the data catalog in governing data for use in analytics and AI
- Core data governance capabilities needed
- Tasks involved in governing a distributed data estate
Accelerating Data Engineering Using a Data Catalog
This session looks at what the requirements are to accelerate data engineering in a modern enterprise and how the requirements can be met using a data catalog and data fabric software.
- The role of the data catalog and data fabric in data engineering, data provisioning, and data sharing
- Defining data products in a business glossary
- Automatically discovering, mapping and classifying data in a data catalog
- Data catalog integration with Data Fabric
- Building data engineering pipelines to produce data products
- Creating a data marketplace as a data catalog application to share business-ready data products
- Publishing and consuming data products using a data marketplace, data contracts, and data catalog metadata
Building an Enterprise Semantic Layer for AI Agents Using a Data Catalog
This session looks at how a data catalog can be used to create an enterprise knowledge graph and common business meaning in a semantic layer to provide context for AI Agents.
- The role of the data catalog as a knowledge graph for AI
- Using a data catalog to capture business and technical metadata about your data and data relationships, and store it in a graph
- Capturing and inferring lineage to understand dependencies
- Provisioning business context metadata to AI Agents to understand meaning via an MCP server and GraphRAG queries
instructor.
Mike Ferguson
Mike Ferguson is the Managing Director of Intelligent Business Strategies Limited. As an independent IT industry analyst and consultant, he specializes in BI/Analytics and data management. With over 40 years of IT experience, Mike has consulted for dozens of companies on BI/analytics, data strategy, technology selection, data architecture and data management.
Mike is also conference chairman of Big Data LDN, the fastest-growing data and analytics conference in Europe and a member of the EDM Council CDMC Executive Advisory Board. He has spoken at events all over the world and written numerous articles.
Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS.
He teaches popular master classes in Data Warehouse Modernization, Big Data Architecture & Technology, How to Govern Data Across a Distributed Data Landscape, Practical Guidelines for Implementing a Data Mesh (Data Catalog, Data Fabric, Data Products, Data Marketplace), Real-Time Analytics, Embedded Analytics, Intelligent Apps & AI Automation, Migrating your Data Warehouse to the Cloud, Modern Data Architecture and Data Virtualisation & the Logical Data Warehouse.
dates & price.
Course
Delivery Method
Dates
Location
Price
pricing.
The fee for this course is EUR 725,00 (+VAT) per person.
We offer the following discounts:
- 10% discount for groups of 2 or more students from the same company registering at the same time
- 20% discount for groups of 4 or more students from the same company registering at the same time
Note: Groups that register at a discounted rate must retain the minimum group size, or the discount will be revoked. Discounts cannot be combined.