Extract, transform, and load.
It doesn’t sound too complicated.

But, as anyone who’s managed a data pipeline will tell you, the simple name hides a ton of complexity.And while none of the steps are easy, the part that gives data engineers nightmares is the transform.

Taking raw data, cleaning it, filtering it, reshaping it, summarizing it, and rolling it up so that it’s ready for analysis.

That’s where most of your time and energy goes, and it’s where there’s the most room for mistakes.If ETL is so hard, why do we do it this way?
The answer, in short, is because there was no other option.

Data warehouses couldn’t handle the raw data as it was extracted from source systems, in all its complexity and size.
So the transform step was necessary before you could load and eventually query data.

The cost, however, was steep.To read this article in full or to leave a comment, please click here

Leave a Reply