I recently read a number of articles comparing ETL to ELT which inspired me to write this blog post. ETL and ELT share many similarities but they are also fundamentally different. In this short blog post, I will talk about what I think the key differences between ETL and ELT are and why I think doing ELT is better than the traditional ETL process.
What are ETL and ELT?
ETL and ELT are data integration processes used in data warehousing that show the movement of data from its source to its destination. ETL and ELT are abbreviations that stand for extract, transform, load (ETL) and extract, load, transform (ELT).
The ETL process looks like this:
The data is extracted from all sources. It is then transformed and finally loaded to its destination, a data warehouse.
The ELT process is slightly different and looks like this:
The graph above is very similar to the first one. The main difference is that the data is transformed only after it was loaded to the data warehouse first. The data is not changed on the way to its destination.
The importance of business rules
There are a lot of articles online comparing ETL to ELT. The two processes are usually compared in terms of technology used (eg ADF vs SSIS), costs, target systems (data warehouse vs data lake), data loading times, etc. These articles are useful because they can give us a lot of insight as to what the two processes are and when to use which but they don't usually talk about one important aspect: the business rules.
I believe that the main difference between ETL and ELT comes down to one key thing and that is: where are the business rules applied? When we do ETL, we apply business rules before the data gets loaded into a data warehouse. When we do ELT, the business rules are applied at the very end of the process.
ETL and ELT are technology independent
Since the main difference between ETL and ELT is the place where the business rules are applied, It will not matter what tool you use: be it SSIS, ADF, Data Stage, etc. In fact; you can do ETL and ELT in any tool you choose! What matters is where the business logic happens. When you apply any business logic between your source and destination you are doing ETL. If you load the data first in its raw format and apply business logic at every end, you are then doing ELT.
What this implies is that the ETL and ELT processes are technology-independent. The tool you use does not determine what process you follow. What does matter is how the tool is used. And so, SSIS can be used to do ELT while ADF can be used to ETL or vice versa.
Why choose ELT over ETL?
1. ELT is simpler
ELT is simpler because we simply copy data in its raw format from one place to another. The data in the destination is a 1 to 1 copy of the data in the source. This is a much simpler process to implement and extend with new sources., When a new source is added, we simply add it to the existing ELT mechanism. We don't touch the existing data flows and as a result that the risk of breaking our solution is smaller.
2. ELT is easier to maintain.
Another benefit that the ELT offers is that data transformation can be done virtually on top of our raw tables or files. Instead of materializing our transformed data into tables or files, we can create views with our transformed data. The process to make changes to views is much simpler than making changes to tables or files. Data virtualization allows us to be more agile and adapt to changes quicker with less disruption to existing processes.
One problem with virtualization can be query performance. Eg SQL views are not as performant as physical tables. When performance becomes an issue the data might have to be materialized to make our queries run faster.
ETL and ELT are data integration processes used in data warehousing. The difference between ETL and ELT is one letter but what a difference it makes!
In my opinion, the key difference between the two processes lies in where the business logic is applied. If business rules are applied before the data gets loaded to its destination we do ETL. If the business logic gets applied after the data was loaded to its final destination we do ELT. In my opinion, ELT is better because it is simpler and easier to maintain. ELT is not going to solve all your data integration problems. Data still needs to be transformed and cleaned which usually takes a big majority of time in all projects. One can even argue that ELT is simply kicking the can down the road and worrying about the T component later. I will disagree with that. In my opinion, ELT allows us to separate data acquisition from information delivery. We break a bigger task into two smaller ones which allow us to deal with these two tasks in a more efficient way.