Data Governance

What it is, why it matters, and best practices.

A simple graphic showing on the outside the three main components of a data governance framework: people, process, and technology. On the inside are represented the key elements and benefits of a data governance program: policy, access, privacy, security, availability, lineage and quality.

Data Governance Guide

This guide provides definitions, frameworks, and practical advice to help you understand and perform modern data governance.

What is Data Governance?

Data governance refers to the set of roles, processes, policies and tools which ensure proper data quality throughout the data lifecycle and proper data usage across an organization. Data governance allows users to more easily find, prepare, use and share trusted datasets on their own, without relying on IT.

Why is it Important?

The primary benefit of data governance is providing the high-quality data necessary for data analytics and BI tools. The insights gained from these tools result in better business decisions and improved performance. Additional benefits include:

  • Improved data accuracy, completeness, and consistency
  • Prevention of data misuse
  • Agreement on common data definitions
  • Removal of data silos between departments and systems
  • Increased trust in data for analytics and decision making
  • Easier to locate data making all data more available
  • Better compliance with data privacy laws and other government regulations such as the EU General Data Protection Regulation (GDPR) and the US Health Insurance Portability and Accountability Act (HIPAA)

Data Governance Framework

The three main components of a data governance framework are people, process, and technology.

A simple graphic showing on the outside the three main components of a data governance framework: people, process, and technology. On the inside are represented the key elements and benefits of a data governance program: policy, access, privacy, security, availability, lineage and quality.

PEOPLE: For your governance program, you should consider including the following roles:

  • Steering Committee: Made up of the Chief Data Officer (and/or the head of IT) and executives from each business unit, this group sets the usage policies and data standards. The committee also defines the mission statement and goals for the program, as well as how its success will be measured.
  • Governance Team: Led by a data governance manager, this team implements and maintains the systems and tools. It’s typically composed of data architects and other governance specialists from the IT department.
  • Data stewards: This team manages the datasets and is responsible for the enforcement of rules and day-to-day needs of the business.

PROCESS: You’ll also need formal processes (or activities) to ensure consistent execution and enforcement of the usage policies and data standards set by the steering committee. These processes can be described in flow charts which make clear inputs and tasks for each use case.

TECHNOLOGY: As the name suggests, this component refers to the tools and techniques used to efficiently maintain and manage the security, integrity, lineage, usability, and availability of data. Modern tools can automate most aspects of managing a governance program. For example, a governed data catalog profiles and documents every data source and defines who in an organization can take which actions on which data.

Learn how data engineers, data stewards, and data consumers work with a data catalog to easily find, prepare, use and share trusted datasets.

Data Governance Best Practices

While you set up the framework described above, keep in mind these three best practices to ensure you’re successful right out of the gate.

  • Write a glossary

    Developing a data glossary (or dictionary) which defines the business terms and concepts you use in your organization will give you consistent business context across multiple tools. For example, everyone should be clear on what qualifies as a “Marketing Qualified Lead” or an “Inactive Customer”.

  • Map and classify your data

    Mapping where your data resides will help you know which system it’s in and how it flows through your organization. Classifying your datasets based on considerations like privacy or sensitivity issues determine how your policies are applied to each dataset.

  • Establish a data catalog

    Building a clear, use case-based data catalog gives you the ability to make different kinds of data available to different kinds of users quickly, without compromising risk. Data catalogs provide information on data lineage, search functions and collaboration tools and give an indexed inventory of available data assets.

A dashboard screenshot of Qlik Sense’s enterprise-scale data catalog

Key Challenge: Balancing Speed & Risk

Governance has traditionally focused on the management of finished data such as financial close metrics, regulatory submissions, and key performance indicators. This type of data requires formal definitions and high data quality.

But today’s advanced data science and data analytics often use raw and semi-finished data. And this creates a tension between data providers and data consumers. Providers work hard to provision data responsibly, to everyone, without putting the business at risk. Consumers want data for their projects immediately.

The tiered system shown below offers a solution to this challenge. The funnel addresses different user needs with different types of data, applying increasing scrutiny and quality standards as the data works its way through the system.

Graphic representing the Enterprise Data Governance Funnel. It shows a tiered system which addresses different user needs with different types of data, applying increasing scrutiny and quality standards as the data works its way through the system.

This system helps the enterprise governance function focus on a breadth of understanding across the enterprise, including enabling restrictions to sensitive data, as well as a depth of understanding for a smaller number of critical data assets.

DataOps for Analytics

Modern data integration that delivers real-time, analytics-ready and actionable data to any analytics environment, from Qlik to Tableau, PowerBI and beyond.

  • Real-Time Data Streaming (CDC)

    Extend enterprise data into live streams to enable modern analytics and microservices with a simple, real-time and universal solution.
  • Agile Data Warehouse Automation

    Quickly design, build, deploy and manage purpose-built cloud data warehouses without manual coding.
  • Managed Data Lake Creation

    Automate complex ingestion and transformation processes to provide continuously updated and analytics-ready data lakes.
  • Enterprise Data Catalog

    Enable analytics across your enterprise with a single, self-service data catalog.

Learn more about data integration with Qlik.