If “big data” is expected to be a mainstay of the operational infrastructure for a broad range of companies (that is, both large companies as well as small/medium-sized businesses), that implies the need for acquiring and integrating those massive data sets that give big data its name (duh!).
Yet in previous blog posts we have already established the existence of a bottleneck in delivering data into big data platforms, and this problem is significantly exacerbated by the (somewhat ingenuous) expectation that massive data sets can be easily provisioned over the Internet. In some cases, the information delivery delays are so oppressive that express mailing hard drives loaded with data sets is preferable to access over the Web!
But before we start to engineer a solution, it is valuable to consider some of the business challenges that are manifested as a result of data access and delivery latency and bottlenecks, both in terms of analytical and operational business application scenarios. Some of the operational implications affect day-to-day operations, such as:
- Longer durations of key cross-functional business processes – Organizations are increasingly recognizing the need for monitoring cross-functional process performance, with key metrics focusing on the end-to-end duration of the process. For example, one key measure for the order-to-cash process is the order-to-fulfill process time, with the goal to reduce that cycle time. Increased operational delays that are related to data availability (slowed by the data latency bottleneck) are going to increase the duration of those kinds of processes; your operational performance metric is going to be negatively impacted even though the root cause has nothing to do with your team’s actual performance.
- Delayed accessibility to “warm” data – One of the promises of big data technology is the ability to create interim archival frameworks using tools like the Hadoop Distributed File System (HDFS) to store and manage large data sets that might be subject to accessibility for various types of business processes (a good example is e-discovery in the legal industry). These “warm data fields” must be loaded with the source data, some of which is expected to be sourced from both internal and external locations. Reliance on delivery over traditional channels such as the internet will prove to be a source of indigestion.
- Scalability of business intelligence and analytics – When the demand for integrated, “right-time” analytics increases (especially as more business applications embed the use of analytical results within standard operating procedures), there is a corresponding increased need for increased data volumes shared with a broader constituency of users, as well as high-speed data accessibility and expectations of data currency. These are impacted by sluggishness in information availability.
The above two examples are specific cases of a more general operational failure of IT to meet agreed-to service levels across the organization as a result of inadvertent narrowing of the channels for data movement. Next time, we look at some of the analytical implications associated with the information delivery bottleneck.