Don’t worry. My intent in this blog is not to discuss the pros and cons of the two approaches – you can read one of my previous blogs posts for that – or describe the benefits of a data lake (there is another post on that). Instead, my intent is to recognize how some of the key benefits of data lakes continue to remain undervalued.
Most people – at least from my somewhat geeky world of data engineering/DataOps – understand the value of data lakes as low-cost repositories for storing loads of all types of data in their native formats. Some even recognize their value as a central source to feed AI, Machine Learning and other data science initiatives. But, unfortunately, many companies still don’t appreciate their value for:
- Data democratization;
- Real-time/low-latency insights; and,
- Support for multiple use-cases (e.g., BI and reporting) and user types.
Recently, I had the pleasure of co-presenting a webinar with
Joe Spinelle, Director of Engineering and Technology at J.B. Hunt, one of the
largest Transportation and Logistics company in North America, and Nauman
Fakhar, Director of ISV Solutions at Databricks, a valued technology partner of
ours. In talking about the issues, Joe highlighted the need for providing real-time,
low latency data from data lakes to support the needs of three
different and heavily active user groups – core application users, the
data insights and analytics team, as well as data scientists.
Not unlike employees at a lot of other larger enterprises,
different user groups at J.B. Hunt had different data needs. The application
engineering group needed (near) real-time access to data in their core systems,
so they could quickly resolve operational issues regarding assets or logistics.
The data science team wanted to build, train and score AI and machine learning
models, while the data insights and analytics team wanted to provide
traditional BI and reporting. J. B. Hunt not only wanted to meet the analytic
demands of multiple user groups while relieving the pressure off of core
operational systems, it also wanted to provide data consumers the same
real-time access to data they were accustomed to with operational systems.
It was a tall order, for a small team – especially with the need to ingest a variety of data types from multiple source systems. The company needed efficiency and centralized control to proactively manage issues.
Working with Qlik and Databricks, Spinelle’s team was able to fully automate real-time data pipelines from multiple source systems to operationalize its data lake, without deploying an army of data engineers. The result was accelerated delivery of analytics-ready data to its users with one-to-two-minute latency. Using Qlik and the Azure Databricks solution, the company not only managed to democratize its data to a variety of users, it also spurred innovation, with the data science team leveraging the data for AI-driven competitive counteroffer bidding and real-time service pricing.
Interested in learning more?
Watch our on-demand webinar, titled “How J.B. Hunt is Improving Efficiency and Customer Experience With AI and Automated Real-Time Data Pipelines,” and see for yourself how Qlik and Databricks together helped J.B. Hunt architect and operationalize a performant data lake.