Pages

Ads 468x60px

Labels

Friday, 29 June 2007

Data Integration Challenge – Facts Arrive Earlier than Dimension


The fact transactions that come in earlier than the dimension (master) records are not bad data, such fact records needs to be handled in our ETL process as a special case. Such situations of facts coming in before dimensions can occur quite commonly like in case of a customer opening a bank account and his transactions starting to flow into the data warehouse immediately.
But the customer id creation process from the Customer Reconciliation System can get delayed and hence the customer data would reach the data warehouse after few days.
How do we handle this scenario differs based on the business process being addressed, there could be two different requirements
  • Make the fact available and report under “In Process” category; commonly followed in financial reporting systems to enable reconciliation
  • Make the fact available only when the dimension is present,; commonly followed in status reporting systems
Requirement 1: Make the fact available and report under “In Process” category
For this requirement follow the below steps
  1. Insert into the dimension table a record that represents a default or ‘In Process’ status like in case of the banking example the Customer Dimension would have a ‘default record’ inserted that represents the information that the customer detail has not yet arrived

  2. In the ETL process while populating the Fact table, for the transactions that do not have a corresponding entry in the Dimension table, assign a default Dimension key and insert into the Fact. In the same process insert the Dimensions Lookup values into a ‘temporary’ or ‘error’ table

  3. Build an ETL process that checks the new records inserted into the Dimension table, queries the temporary table and identifies the records in facts for which the dimension key has to be updated and updates the respective fact’s dimension key

Requirement 2: Make the fact available only when the dimension is present
For this requirement follow the below steps
  1. Build an ETL process that populates the fact into a staging table

  2. Build an ETL process that pushes only the records that have a dimension value to the data warehouse tables

  3. At the end of ETL process delete all the processed records from the staging table making the other unprocessed records available to be pulled next time

You can read more about  Data Integration Challenge

0 comments:

Post a Comment