As stated above, the term “data pipeline” refers to the broad set of all processes in which data is moved between systems, even with today’s data fabric approach. ETL pipelines are a particular type of data pipeline. Below are three key differences between the two:
First, data pipelines don’t have to run in batches. ETL pipelines usually move data to the target system in batches on a regular schedule. But certain data pipelines can perform real-time processing with streaming computation, which allows data sets to be continuously updated. This supports real-time analytics and reporting and can trigger other apps and systems.
Second, data pipelines don’t have to transform the data. ETL pipelines transform data before loading it into the target system. But data pipelines can either transform data after loading it into the target system (ELT) or not transform it at all.
Third, data pipelines don’t have to stop after loading the data. ETL pipelines end after loading data into the target repository. But data pipelines can stream data, and therefore their load process can trigger processes in other systems or enable real-time reporting.