Ids in Source System: Sometimes we would have been in a dilemma to decide whether to carry the identity (id) fields from source system as identity fields into the data warehouse as well. There is couple of situations which would push us to this state1.The business users are familiar with the product ids like 1211 , 1212 than by the product name it self and they need them in the target system
2.Why
should we create an additional id field in the target table when I can
have a unique identity for each record from the source system
What are Opensource Id fields,
they are usually the surrogate keys or unique record keys like product
id, customer id etc which the business might be more familiar than with
their descriptions, descriptions of these ids are more found on report
printouts. In general most of the source id fields would get mapped to
the dimension tables.
Here are the reasons why we should carry the source id fields as it is
- The business is comfortable talking and analyzing in terms of ids than descriptions
- Having source ids fields which are usually numeric or if not of smaller length is very much lookup friendly, using ids for lookup or filter or join conditions when pulling data from source systems is much better than descriptions
- Source id fields enables linking of the data from the data warehouse to the source system
- Just consider the id as another attribute to the dimension and not as a unique identifier
Here are the reasons why we should create additional id (key) field in addition to the source id field
- Avoiding duplicate keys if the data to be sourced from multiple systems
- The source ids can merge, split, anything can happen, we would want to avoid the dependency on the source system
- The id field created in the data warehouse would be index friendly
- Having a unique id for each record in the data warehouse would help in determining the number of unique records in a much easier way
Dates in Source System:
One other field that we usually confuse is the date field in the source
systems. The date field present in the source record might provide the
information when the record arrived in the source system and a date
field generated in the data warehouse system would provide the
information when a source record arrived into the data warehouse.
The
data warehouse record date and the source record date can be same if
the source record gets moved into the data warehouse the same day,
certainly both date fields might represent different date values if
there is a delay in the source data arrival into the data warehouse.
Why we need to store Source Date
in the data warehouse, this need is very clear, we always perform date
based analysis based on the arrival of source record in the source
systems.
Why we need to generate a Data Warehouse Date,
capturing the arrival of the record into the data warehouse answers
queries related to audit, data growth and as well to determine what new
records arrived into the warehouse which is especially useful for
providing incremental extracts for downstream marts or other systems.
Read Brief About: Data Integration Challenge
Read Brief About: Data Integration Challenge