Data Engineer vs. Data Scientist

I received a lot of questions after posting my data pipelines article about the mechanics of who creates and maintains them – more specifically, whether it’s the responsibility of the folks in data engineering or should it be left to the data scientists. The confusion is understandable when you examine job descriptions, because it’s common to see the phrase “must be able to build data pipelines.” Therefore, this post describes how I define the difference.

Which Department and What Are Their Objectives

Let’s first examine the role of the data engineer and where they work. The job title has gradually emerged in recent years and has generally replaced the older titles of “integration engineer” or “ETL developer.” Consequently, it’s a job that reports into central IT and focuses on developing the architecture, infrastructure and processes for designing, developing and maintaining enterprise-wide data pipelines.

The data scientist, on the other hand, is someone who most likely reports into a business unit and is responsible for implementing pipeline procedures to clean, wrangle, model and analyze data. Consequently, although both roles are responsible for acquiring data in a usable format, the ultimate goals are considerably different.

A comparison table showing that a Data Engineer reports to Central IT, while a Data Scientist reports to the Line of Business Operations.

Data Engineering Responsibilities

Data Scientists' Responsibilities

Data scientists will usually get data that has passed the “first round” of cleaning and manipulation from data engineering, which they then use to feed their analytics applications, machine learning projects and statistical predictive models. However, data scientists also use their pipelines to augment that data with industry research, demographic information and behavioral data to answer pressing business questions.

Comparison chart: Data Engineer - Create data pipelines for multiple business entities; Data Scientist - Creates "last mile" data pipelines for analytics and machine learning.

Although there is some overlap in skillsets, the two roles are distinct. The data engineer has skills best suited for working with database systems, data APIs, ETL/ELT solutions, and will be involved in data modeling and maintaining data warehouses, whereas the data scientist has experience with statistics, math and machine learning for predictive models.

Languages, Software, Skills and Tools

Given we mentioned skill overlap, let’s now examine the differences in skillsets, languages, tools and software that both roles use. The languages, software, tooling and infrastructure used by data engineers runs the gamut of Enterprise IT. As we mentioned earlier, that’s the traditional trove of data tools like SQL and ELT. Increasingly, knowledge of public cloud infrastructure solutions from Amazon, Google and Microsoft is now considered mandatory for the modern data engineer. Suffice it to say, many a data engineer uses Qlik Data Integration as a core component to architect their data pipelines.

Data scientists will make use of languages such as R, Python, Julia and Scala to build models. The most popular tools, however, are Python and R. When you’re working with Python and R for data science, the languages will most often resort to opensource libraries, such as Pandas and NumPy.

Finally, we can’t leave this data science skills discussion without covering data visualization and storytelling, Although the data scientist role might focus on using Jupyter Notebooks with Python’s matplotlib, many turn to Qlik Data Analytics for enterprise-scale business intelligence and analytics visualizations, too.

Comparison chart displaying skills for Data Engineer and Data Scientist roles. Data Engineer: SQL, Python, Spark, AWS, ETL/ELT. Data Scientist: SQL, Python, Java, modelling, NumPy, Matplotlib, Jupyter.

Salaries and Outlook

Now, this is the section you’re all waiting for. How do salaries compare? It’s true that the data scientist role has been in massive demand for a few years, but recently the temperature seems to have cooled a little. U.S. News & World Report’s 2021 job survey still lists Data Scientist as the eighth best job in the United States. And Glassdoor lists the median salary as approximately $114,00. Not at all shabby!

Not to be outdone, data engineering is in strong demand, too. A quick search of LinkedIn highlights over 200,000 available jobs worldwide. Again, if we check Glassdoor, the average data engineer’s salary in the United States is about $110,000. That’s only slightly lower than that of one the top 10 most desirable jobs!

A chart compares salaries: Data Engineer with a median salary of $110,000 and Data Scientist with a median salary of $114,000.

Conclusion

We could argue that the “data scientist bubble” is about to burst, but there’s no denying that the demand for data expertise is strong, with a positive outlook for the immediate future. However, one thing is certain. Your prospects look good whether you choose data engineering or data science.

Confused over differences between #dataengineers & #datascientists? @Qlik's @cbearman gives you all the answers - and also compares #salaries.

In this article:

Data Integration

Ready to get started?

Request a Demo

Why Qlik?

Make AI Work for Your Business

Data Integration and Quality

Qlik Cloud Analytics

Find a partner

Global System Integrators