One of the common problems in any data integration and data warehouse project is to understand the source data.
The data warehouse consists of data from different internal and external objects.
These objects are sometimes poorly documented or their ERD is not available.
Furthermore, the source objects are often not from the same single native source.
In many cases analyzing the sources and understanding the relationship between the objects is the Achilles‛
heel of the process.
Since analyzing the source is a complicated and expensive task, it is often postponed to
" . . . when there will be more time and budget . . . "
which usually results in this critical step being only partially completed or
skipped altogether. The consequence is that the data integration is performed without sufficient knowledge of the data.
In addition, since the data profiling process is often skipped, no reverse ERD is created to support the integration process.
Therefore, it is most likely that poor (or no) data source profiling is the main reason for failures of data warehouse projects.
The lack of visual models creates a barrier between the business customer and the technical team and between the target ODS
and its sources.
Since the data source profiling and visual reversed ERD model is a fundamental step of the data warehouse, DWLOGIC developed
a profiler-ERD-Engine mechanism. It is designed to almost automatically support the entire process ranging from
gathering the data sources via data profiling, to breaking down the "pseudo" ERD model to subject areas.
Below are the profiler-ERD-Engine steps (on a macro level):
Step 1: Loading multiple data sources into the Profiler repository (ERD-Engine).
The purpose of this step is to unify all the source objects from the different data sources into a single
schema in order to be able to analyze the spread sources as if they were one single source.