Andreas Bartsch, Head of Service Delivery at PBT Group
Thanks to the availability of artificial intelligence (AI) and machine learning (ML), more companies are focusing their efforts on appointing data scientists to extract business value from the data at their disposal. This has subsequently made the role of data engineers even more critical as this skill is essential to prepare this data for use.
With so much industry buzz around data science, many companies are confusing the two roles and even seeing them as interchangeable. People are drawn to the benefits that AI and ML can deliver to an organisation. Many educational institutions have even developed degrees around data science. And while this is essential to transforming data into actionable insights, it is the job of engineers to get it to that stage.
Therefore, enabling data science is the data engineer. If these scientists cannot get quality data at the right time and in the appropriate format, then it is impossible to do their jobs effectively. So, while science unlocks the business value of data, much of the effort to enable this lies in the engineering component. And so, for every data scientist an organisation appoints, it needs to ensure it has multiple data engineers on board to help get everything in place.
An evolving role
Unlike data science that has grown as a concept over the past five years, data engineers have been around for a long time. Previously, they were referred to as extract-transform-load (ETL) developers. Over the past two decades, this role has evolved into more of an engineering one.
Much of the evolved focus of the engineer is around big data and distributed systems. British engineer, Gordon Lindsay Glegg, is quoted as saying ‘a scientist can discover a new star, but he cannot make one. He would have to ask an engineer to do it for him.’
And therein lies the rub of the skillset required to be a data engineer.
There are certainly aspects of this role that are being taught at university. For example, information management studies touch on the skills required for these engineers. Experience in development platforms and the likes of SQL and Python are also vital.
But just because a student graduates with those skills in place does not make them a data engineer. Instead, much of the role is influenced by the experience and exposure gained by working in this environment.
This is not to say that engineers and scientists can operate completely independently from one another. Engineers might not have to be experts in the tools and technology that scientists use, but they need to be exposed to them and understand them. This enables them to better align to what scientists require.
Therefore, the more rounded a data engineer is, the better. They must receive training on the broader concepts of the modern data world – not only on the technology side but also on the data modelling aspects as well.
To this end, they will then be better positioned to design and build the data pipeline that enables the scientists and all data consumers. Therefore, the data engineer role is a critical cog in the digital business world and one that must be filled if organisations are to remain relevant.