The Paradox of Data Quality

The More You Ask a User to Provide, the Less You Can Trust the Result

In the past few weeks, I have been settling into my new role as the Chief Data Officer (CDO) at Qlik. Like most companies in 2020, Qlik is focused on capitalizing on new data and analytics trends to drive the agility of our decision-making process. Unlike most companies, we have the luxury of the availability of Qlik’s own tools, which include the Qlik Data Integration software suite for data streaming and catalog and the Qlik Data Analytics suite for analysis, visualization and action.

As I got started, I met with many business and operations people across the company to understand information gaps and analytics aspirations across the enterprise. In most cases, there were some traditional process challenges related to data, and I carefully took notes on which data sets required prioritization to go into our enterprise catalog. Other conversations focused on the evolution of analytics from traditional KPIs to machine learning and predictive modeling against complex telemetry data as our business model evolved from on-prem software to SaaS distribution. But one conversation stopped me in my tracks. “Don’t bother trying to do that analysis,” my colleague told me. “The data just doesn’t exist.”

Doesn’t exist? How can it be that the transactions that ran our business did not leave a clear breadcrumb trail of what had taken place to enable descriptive and predictive analytics?

Over the next few days, I came to understand an interesting paradox in our systems design. As an analytics company, we had a great number of attributes that we had added to our transactions, so as to be able to report on them later. But over time, so many people had asked for so many things that business users could become overwhelmed by the data request and simply left a number of these fields blank. The paradox, in short, was that our very thirst for more data was, in fact, giving us less.

I started to do some research on this and stumbled upon Hick’s Law, which has been used in software design for decades. It stipulates that, as you increase the number of choices that application users have, there is a logarithmic relationship to the time that it takes them to complete the choice. In other words, too many choices can set in a certain paralysis of decision making. It seems only logical that this could also relate to the number of fields that someone has to update. And, I suspect, the human reaction to overcoming this stress is to ignore anything that isn’t mandatory, leading directly to a data quality challenge that is an obstacle to the very analytics that the data was meant to enable.

This got me thinking about conventional data quality practices and how flawed they can be. We choose critical data elements, we design periodic profile checks, and put data stewards in place to monitor and measure the data that matters to us. This is all constructive, but there is a well-known adage that I like that fits here: “If you want to remove pollution from a river, you should start by stopping the contamination of the river in the first place.” In other words, if we need data to be reliable, not only do we need to monitor it, but we also need to put simple processes and useful controls in place to get the data right at the time of entry.

Our response to this at Qlik is that, in addition to the establishment of standard data stewardship and quality practices, we are developing process owners who have authority to automate and simplify screens for users, to implement checks on the spot as data is being entered, and to assume responsibility for not only the execution of the transaction, but also this transaction’s impact to downstream transactional and analytical needs.

So, in short, who owns quality? Well, certainly more than one person or team. A process owner designs the right process and requirements to enable automation, IT owns the implementation of systems that are simple and meet these needs, and the office of CDO owns the quality check in case anything falls through the cracks. The establishment of these roles and the relationships that bind these roles is the foundation of the Qlik analytics strategy.

Our own @JoeDosSantos writes about a paradox in #data quality - a thirst for more data can give us less if we don't control for quality. Read his latest blog post.

Comments

Get ready to transform your entire business with data.

Follow Qlik