Like many of us hunkering down at home these days, I was recently browsing through my media streaming services looking for a movie to watch when I came across an interesting title: The Hummingbird Project. Turns out it’s very similar to the Michael Lewis book Flash Boys (the movie’s director says he was inspired by the book).
Both stories center on some entrepreneurs who decide they want to build a fiber optic connection in an exact straight line between a Midwestern city and New York. The idea was to reduce the transmission time for a stock trade from 17 to 13 milliseconds (or 16 milliseconds in the movie). Even though it cost hundreds of millions of dollars to build this link, they calculated the payoff was worth it.
This made me think about how Data Architects are struggling with this same issue but at a different scale. How can they setup an accessible data source that doesn’t take months or years to setup? And how can they ensure that the new data source is kept current? Users are looking for the data to be up to the minute or even the last few seconds.
It starts with adopting a new approach to the overall problem: DataOps. We’re written a number of blogs about this emerging concept. DataOps builds on the methods of the DevOps concept, which combines software development and IT operations to improve the velocity, quality, predictability and scale of software development and deployment. DataOps seeks to bring similar improvements with delivering data for analytics, enabling practices, processes and technologies for building and enhancing data pipelines to quickly meet business needs.
An example of where a DataOps approach could have helped was with the early big data projects. Several studies have shown that very few of these projects provided a decent ROI or even any real business value. Most of these projects were run by IT and/or data engineers who focused almost exclusively on storing the data in Hadoop or an equivalent technology. Everyone was focused on putting the data into the source, not on how to take data out.
Not surprisingly, these massive data stores were then vastly under-utilized. As no one from the business or data consumer side was involved in defining the requirements, the collected data was either useless or indecipherable. A DataOps approach would have first had IT working closely with the business to define requirements and then taking an iterative approach to make sure the initial collected data was meeting the business needs before the spigot was opened up.
Once there is a foundation of DataOps, there are some additional strategies to consider:
Want to learn more? Download our eBook: “Enterprise Architect’s Guide: Top 4 Strategies for Automating and Accelerating Your Data Pipeline”.