Data Governance

What it is, why it matters, and best practices. This guide provides definitions, frameworks, and practical advice to help you understand and perform modern data governance.

Illustration of data governance stages: planning, implementation, and monitoring. Data governance ensures data quality and usage across an organization.

What is Data Governance?

Data governance refers to the set of roles, processes, policies and tools which ensure proper data quality throughout the data lifecycle and proper data usage across an organization. Data governance allows users to more easily find, prepare, use and share trusted datasets on their own, without relying on IT.

Why is it Important?

The primary benefit of data governance is providing the high-quality data necessary for data analytics and BI tools. The insights gained from these tools result in better business decisions and improved performance. Additional benefits include:

  • Improved data accuracy, completeness, and consistency

  • Prevention of data misuse

  • Agreement on common data definitions

  • Removal of data silos between departments and systems

  • Increased trust in data for analytics and decision making

  • Easier to locate data making all data more available

  • Better compliance with data privacy laws and other government regulations such as the EU General Data Protection Regulation (GDPR) and the US Health Insurance Portability and Accountability Act (HIPAA)

Data Governance Framework

The three main components of a data governance framework are people, process, and technology.

PEOPLE: For your governance program, you should consider including the following roles:

  • Steering Committee: Made up of the Chief Data Officer (and/or the head of IT) and executives from each business unit, this group sets the usage policies and data standards. The committee also defines the mission statement and goals for the program, as well as how its success will be measured.

  • Governance Team: Led by a data governance manager, this team implements and maintains the systems and tools. It’s typically composed of data architects and other governance specialists from the IT department.

  • Data stewards: This team manages the datasets and is responsible for the enforcement of rules and day-to-day needs of the business.

PROCESS: You’ll also need formal processes (or activities) to ensure consistent execution and enforcement of the usage policies and data standards set by the steering committee. These processes can be described in flow charts which make clear inputs and tasks for each use case.

TECHNOLOGY: As the name suggests, this component refers to the tools and techniques used to efficiently maintain and manage the security, integrity, lineage, usability, and availability of data. Modern tools can automate most aspects of managing a governance program. For example, a governed data catalog profiles and documents every data source and defines who in an organization can take which actions on which data.

This 2-minute video describes how data engineers, data stewards, and data consumers work with a data catalog as part of a robust data governance process.

Manage Quality and Security in the Modern Data Analytics Pipeline

Data Governance Best Practices

While you set up the framework described above, keep in mind these three best practices to ensure you’re successful right out of the gate.

Write a glossary

Map and classify your data

Establish a data catalog

A dashboard screenshot of Qlik Sense’s enterprise-scale data catalog

The Role of Data Lineage

Data lineage refers to the process of tracking all changes made to data on its journey from source to current location. Data lineage tools help you understand and visualize these changes and data flows so you can know where any specific piece of data came from, how it split and merged with other data, and what transformations have been applied.

So, in a data governance framework, a data steward or data engineer would use a lineage visualization similar to the below example to know they can trust the data and/or trace any errors back to the root cause.

Learn more about data lineage.

Key Challenge: Balancing Speed & Risk

Governance has traditionally focused on the management of finished data such as financial close metrics, regulatory submissions, and key performance indicators. This type of data requires formal definitions and high data quality.

But today’s advanced data science and data analytics often use raw and semi-finished data. And this creates a tension between data providers and data consumers. Providers work hard to provision data responsibly, to everyone, without putting the business at risk. Consumers want data for their projects immediately.

The tiered system shown below offers a solution to this challenge. The funnel addresses different user needs with different types of data, applying increasing scrutiny and quality standards as the data works its way through the system.

This system helps the enterprise governance function focus on a breadth of understanding across the enterprise, including enabling restrictions to sensitive data, as well as a depth of understanding for a smaller number of critical data assets.

Learn more about data integration with Qlik