This, of course, creates friction. Lots of friction. Let’s
evaluate their respective points of view (embellished, of course for dramatic
The data analyst/data scientist:
“These Enterprise Governance data police are always getting in my way. First, they block my access to data. Then, they make me waste weeks jumping through hoops to get data only to find that it’s not what I needed in the first place. What I really want is good data faster, not some fantasy land data model that someone in corporate dreamed up. And if someone tells me about how important metadata and ontologies are one more time, my head is going to explode. Just trust me and give me the data and get out of my way! Then I can make you some money!”
The Chief Data Officer:
“I am so sick and tired of these data cowboys grabbing whatever they want from wherever they want, working around the process, and putting the whole company at risk. Don’t they know that we are one hacker away from being out of business or the punchline of every data breach joke? This is not a game. Also, they want everything now. Please, get in line, buddy. And when I finally do move heaven and earth to get them what they want, these data bad boys just get execs revved up with data that does not reconcile to the books and is based on their own unique brand of fantasy. I have regulators crawling all over me and you are making it much worse.”
My goal is creating peace. Let’s call it a Big Data Accord.
Let’s start with a different understanding of data through the lens of a data supply chain. Just like a manufacturing company would have raw materials, semi-finished goods, and finished goods so data exists in the organization. Raw data exists in applications to make a transaction happen. It may not align with other applications, but that is not why it’s there. It’s simply there to make sure that customers can order, that employees can record expenses, or purchasing departments can create POs. Semi-finished data is created every day in the organization for a discreet purpose – a spreadsheet with the sales forecast, a BI application with service incidents, or a real estate view of how many seats are required for employee and contractor data. Finally, finished goods are the analytic gold. Financial close data, regulatory submissions, and critical corporate KPIs require one formal definition with high data quality and often have an SLA.
Traditionally, data governance has focused on the management of finished goods whereas the most clever of data analysts rely on raw and semi-finished goods, thereby creating much of the friction in the data provisioning process. I would like to propose a new model for governing data that reflects both the expectations of the business user as well as the reasonable expectations of data governance and quality that the enterprise can support, each level applying increasing scrutiny and quality standards as it works through the system.
Of consequence in this model is that users of data at each step of the data curation process should have a reasonable understanding of what the relative quality and reliability of the data that they are using. Want to put a KPI on the CEO’s desk? It better be curated with no argument of data. Want to create a predictive model of how weather will affect sales? Available is good enough. Once I figure out which of the sources I am using to make my model is useful, I can then insist on a higher degree of curation. This helps the enterprise governance function focus on breadth of understanding across the enterprise including enabling restrictions to sensitive data and also depth for a smaller number of critical data assets.
Expectations are managed. Analyst users are happy. Profit ensues. Peace at last.
If only it were that easy. In my next blog, I will focus on tools and processes for enabling this data funnel (Hint, it won’t be your father’s data governance suite).