There has been a lot of industry chatter lately on the notion of DataOps (data operations). In summary, this is a framework for moving the same kind of agility to the analytics space that many IT organizations have embraced with application development. The principles of iterative development, sprints, and failing fast can apply equally well here, but before we fall down the tech jargon rabbit hole, let’s get some perspective on what makes analytics slow in the first place.
Most skilled data analysts and data scientists are quite comfortable with the concept of quick ad hoc queries, which is how they spend most of their time. They specialize in asking a creative question that they didn’t know that they had five minutes before, and they bring a wealth of experience and technical skill to make sense of the data in new ways. The challenge for this group is getting their hands on the right data in the first place or pivoting to a new data source when a hypothesis fails. Failing fast is great, but not if it will take weeks or months to start again. Many times, analysts are then told that it will be 2-10 weeks before their data is available. Thus, the major bottleneck to DataOps seems to lie in the alignment of the data provisioning pipeline to the analytics team, not in the skills of the analytics community.
But rather than just talk about data and analytics, I think it’s essential to widen the aperture a bit to include context around when the data will be required, and the analytics cycle it will generate. The data pipeline must be thought of as any operational process, aligning key manufacturing concepts such as supply chain planning, resource planning, and distribution planning. This requires a new framework for aligning 4 key items:
Let’s break down the key activities and tools to enable this process.
Business Strategy
At the heart of any analytics problem, there is a business need. While this seems obvious, it’s remarkable how difficult alignment is to maintain between analytics execution and the C suite. This is largely because popular culture has described modern analytics strategies like machine learning and artificial intelligence as techy and beyond the understanding of mere mortals. As a result, executives often delegate analytics strategy to technology, which is absolutely the wrong approach. Executives must focus on the connection of strategy to business moments that can be altered through data. Amazon and Netflix are deemed masters of “catching the customer in the act of deciding,” but this approach could be more ubiquitous with a little creativity. Imagine a doctor deciding on a diagnosis influenced by analytics, a banking customer choosing the best lending instrument to expand their house, or an insurance client receiving analytics about climate impact and influencing them to invest in flood insurance. Companies need to invest in processes that help to identify, prioritize, and fund these analytics projects as part of a carefully managed process. They also need to clearly understand how these analytics projects will be measured for value creation.
Data Availability
If we know what strategies are driving analytics, we can shift our attention to the data that might be important for these analytics. Customer attrition? Grab web logs, customer demographics, and transactions. Property risk? Try weather, geo-spatial data, and loss data.You get the idea. Companies need to focus on establishing a clear use case-based catalog of data. They need the ability to onboard, profile, describe, secure, and potentially prepare and obfuscate data quickly in anticipation of analytics sprints. A key concept here is data on demand for analytics. Processes should be robust to put data on the shelves in advance of the analytics execution, which requires speed measured in hours rather than weeks for data onboarding and catalog. Data Governance in this context is not about torturing data into truth, but rather clearly defining, classifying and provisioning data assets to people who are authorized to see it at speed and scale.
Analytics Execution
In a data democratized world, we need to expect more from our analytics user communities to read, work with, analyze, and argue with data. We also need to increase the skills at every position. Report readers should become report builders. Report builders should understand predictive modeling. Predictive modelers need to learn TensorFlow and machine learning technologies. They need to try and fail quickly. They need to experiment with new and interesting data sets. They need to follow a specific methodology to move from the aspirational value of the business strategy, to a committed value that acts as a trigger for business users to evaluate and prioritize against other initiatives. And most importantly, there must be a tie back to the business to quantify and evaluate potential business impact and prioritize for production deployment and balance against risk.
Analytics Operationalization
Now it’s time to deploy the app in a production context. This is where the DevOps portion of DataOps kicks in. Is the analytic performant in real-time? Is it clearly embedded in an application in a seamless way? Once it is put into production, likely in a limited way, operationalization processes must measure impact. This impact must come full circle to the business strategy. Is the analytics having the desired impact? Should we change our approach? Is the challenge one of concept, analytic utility, or deployment option? This closed loop process will provide a critical link back to the non-technical community and allow for clear measurement.
Let’s keep it real – this is not easy. If you have 30 minutes, you can watch me go into the details of the challenges of this integration at the December 2018 Forrester Conference here. There are significant cultural, procedural, political, and technical challenges in achieving this goal. But having a clear vision of the goal line with language that's business focused rather than technology focused, will most certainly accelerate this journey.