In my last post, I noted that over the past twenty years, increased hardware performance coupled with innovations in data management and integration tools has created an environment in which the spectrum of business intelligence users has expanded, which in turn has only added to the demand for information availability. This post looks at an alternate set of drivers for information availability, namely the potential of harnessing and analyzing massive data sets for business value creation.
The appeal of big data analytics stems from three specific perceptions of benefit:
- The expectation of being able to consume very large data sets and separate relevant business “signals” from the massive amount of “noise.”
- Scalable combination of commodity components lowers the barrier to entry.
- Ease of implementation and use, especially in contrast to some of the effort associated with monolithic enterprise data warehouses.
The promise of big data analytics additionally reflects the “democratization of business intelligence” effect I suggested in a prior post in that it presents a methodology for rapidly conjuring up an environment that, once the massive amounts of data are loaded, can enable a wide spectrum of investigations and analyses in a way that is both scalable and flexible.
Of course, that one qualification – once the massive amounts of data are loaded – remains the kicker, as it did for enterprise data warehousing and integrated pervasive business intelligence. Big data platforms are configured to take advantage of massive parallelism, and their incorporation of commodity computational and storage componentry enables elasticity and scalability within the platform environment. However, the challenges that often accompany this part of the process include:
- Data sets to be analyzed generally do not originate within the analysis platform;
- The data comes in different sizes and formats; and
- Much of that data is bound to be completely absent of structure.
In addition, Internet bandwidth tends to be insufficient to satisfy timely delivery of massive data sets, only adding to the complexity and bottlenecks. The data sets will need to be collected, collated, and then brought into the big data server before any of the analyses can begin.
In other words, even with the promise of high performance calculation and computation, the analysis processes are still limited by the challenges of making information available in ‘right’ time. Environments that have not considered approaches such as data replication for improving the speed of data movement will still be constrained by the data access bottleneck.