The conference attendees were all data and analytics leaders
who were keen to enhance their data strategies and were excited to share their
own experiences. But, before I go any further, I’d like to send a big thank you
to TDWI’s Mark
Peco for moderating such a splendid discussion.
The Day of the Roundtable
After initial pleasantries and introductions, the participant demographic became extremely clear. The contributors came from many industries and companies of varying sizes. In addition, many were at different stages in their analytics journey. Some had very established extract, transform, load (ETL) practices, while others were just starting to formalize their data strategies — all in all, a very diverse set of practitioners. Nevertheless, although there were acute differences between participants, they all expressed a common goal to provide data to their businesses in an efficient, reliable and repeatable manner.
Is Data Pipeline a Marketing Term?
The moderator Mark opened the roundtable discussion with his opinion on data pipelines. He stated that he thought of data pipelines as “automated processes that move data from one system to another, transforming the data along the way, until it’s ready for business consumption.” He further expanded on that general description by clarifying that he believed many data pipeline types were in use today. He drew an analogy between the varying types of real-world pipelines to validate his point. “For example,” he stated, “the Trans-Alaskan Pipeline system transports crude from the oil fields to the Valdez Marine Terminal, but it isn’t the same type of pipeline that delivers water from rural reservoirs to your house.”Overall, the attendees were in general agreement.
I chimed in with what I am seeing from a market perspective. I commented how many customers mix and match pipelines as business requirements dictate. Some customers are perfectly happy with their existing batch ETL pipelines for legacy reporting requirements, but many also use real-time data pipelines based on change data capture technology for newer analytics processes, such as fraud reporting and machine learning. I also explained how some customers prefer to use an ELT pipeline approach, especially when cloud data warehouses or lakes are the target repository. In this example, the data is extracted from the source systems in real time and delivered to the target system, with the transformation occurring in the future.
As the discussion progressed, it was evident that data pipelines are not just a marketing concept, but something that thousands of companies use every day to deliver the right data to their business users.
Is ETL Still Relevant?
Although many people use the terms “ETL” and “data pipeline” synonymously, we agreed that they are different. ETL pipelines refer to processes that extract data from one system, transform it and load it into another. ETL is usually performed in a batch mode and the target is either a database or a data warehouse.
Data pipelines, on the other hand, are a superset and refer to any set of processing elements that move data from one system to another, possibly transforming the data along the way. However, data pipelines are largely associated with streaming computation (meaning every event is handled and processed as it occurs). Additionally, data pipelines don’t have to end in loading the data into a database or a data warehouse. They can, for example, trigger further business processes or data processing on other systems.
The subject of the roundtable discussion then turned to “transitioning from ETL,” as mentioned in the title of the session, with many questioning whether it’s necessary to move away from ETL. My view is that you should always use the best tool for the job, and, if it’s not broken, then don’t fix it. Some participants commented that changing their traditional ETL pipelines can take weeks and mused that perhaps they should investigate alternative data delivery solutions.
By now, we had reached the session time limit and had to wrap up and began our summary. What did we learn? Well, I was reminded that, once again, one size of solution does not meet all user demands. I was also reminded that not everyone is at the same stage of their data journey, either. The key takeaway for me was that data pipelines can provide businesses the adaptability and flexibility they sorely need to deliver the right data in today’s highly dynamic environments.