You all know the story: since the famous 2012 Harvard Business Review calling the Data Scientist the Sexiest Job of the 21st Century, the area has seen a lot of growth, both in companies hiring and people wanting to get started in the area.
As companies started “doing” Data Science and Machine Learning, one issue arrived: where does the data come from? Initially, most companies added the responsibility for getting and processing the data to Data Scientists, but that sometimes didn’t match the skillsets of Data Scientists or their interests. So, a new area started to grow: Data Engineering.
Data can come from several different sources: IoT devices, activity on e-commerce, banking transactions, open government data, and all kinds of user activity on the web or mobile applications. To be any value for the company, this data first needs to be collected, processed, transformed, and stored. This entire data pipeline (that can be either un in real-time streams or at some regular cadence in batches) is the responsibility of the data engineers. Data engineering teams are responsible for the design, construction, maintenance, extension, and infrastructure that supports data pipelines.
Given this definition of Data Engineering, is easy to see that their work is complementary to the work of Data Scientists. But how many data engineers per data scientist should a company have? There are several different numbers (and that very much depends on the sector and type of company), but most people agree that we need at least one data engineer for each data scientist (but in most situations, more than that). Just to give an example, this O’Reilly article talks about a starting point of 2-3 data engineers for each data scientist but suggests that depending on the complexity of the company structure the number could be high as 4-5 data engineers for each data scientist.
So, if the demand for data engineers is correlated with data scientists and the data science area is growing, we should see an increase in data engineering jobs, right? Yes, and I want to bring a few numbers to illustrate that:
In this kdnugget article, the author explored all the data roles being hired for every company under Y Combinator since 2021. Their conclusion? There are a lot more open positions for Data Engineering than Data Scientist (also, as you can see in the graph below, the second most requested job opening is ML Engineer, which has a lot of similarities with Data Engineering in terms of software development practices):
More numbers: the Dice 2020 Tech Job report, as seen here, reported that Data Engineering was the fastest-growing job in technology in 2020 with a year-on-year growth of 50% (followed by Backend Developer with 38% and Senior Data Scientist with 32%). Finally, in this article, the author calculated the job opening on LinkedIn and Indeed, generating the table below:
As you can see in the table, the job opening for Data Science is bigger than Data Engineer. However, for both Data Engineer and Big Data Engineer, the number of candidates / open positions is half, which means that are much more people interested in Data Science than the relative amount of job openings.
So, after all this explanation, am I saying that you all should leave Data Science and become Data Engineers? No, that’s not my point. The biggest takeaway here is that there is demand for both jobs and people should at least be open to the idea of studying and becoming a Data Engineer.
Speaking about becoming a Data Engineer, Poatek has an open position for Data Engineering internships, so if you’re interested, you can subscribe here. But if you are already looking for full-time jobs, we also have open positions for Data Engineers here. Finally, if you want to learn more about Spark and Big Data, I recently did a series of posts, which you can read here and here.