Business Intelligence & Analytics: Data Integration Challenge

Friday, 13 July 2007

Data Integration Challenge – Capturing Changes

When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.

There are two steps to it

Pull the incremental data from the source file or table
Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete

Step 1: Pull the incremental data from the source file or table
If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps

While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
Use this persistent variable value as a filter in the next day to pull the incremental data from the source table

Step 2: Determine the impact of the record on target table as Insert/Update/ Delete

Following are the scenarios that we would face and the suggested approach

Data file has only incremental data from Step 1 or the source itself provide only incremental data
- do a lookup on the target table and determine whether it’s a new record or an existing record
- if an existing record then compare the required fields to determine whether it’s an updated record
- have a process to find the aged records in the target table and do a clean up for ‘deletes’
Data file has full complete data because no audit columns are present
- The data is of higher
  - have a back up of the previously received file
  - perform a comparison of the current file and prior file; create a ‘change file’ by determining the inserts, updates and deletes. Ensure both the ‘current’ and ‘prior’ file are sorted by key fields
  - have a process that reads the ‘change file’ and loads the data into the target table
  - based on the ‘change file’ volume, we could decide whether to do a ‘truncate & load’
- The data is of lower volume
  - do a lookup on the target table and determine whether it’s a new record or an existing record
  - if an existing record then compare the required fields to determine whether it’s an updated record
  - have a process to find the aged records in the target table and do a clean up or delete
    
    Read more about Data Integration Challenge

Pages

Business Intelligence & Analytics

Ads 468x60px

Labels

Friday, 13 July 2007

Data Integration Challenge – Capturing Changes

0 comments:

Post a Comment

Connect Us

MBA Admission 2018

Events and News

My Favorite Links

Blog Archive

My Blog List

Popular Posts

Mamta @ Twitter

About Me

Pages

Business Intelligence & Analytics

Ads 468x60px

Labels

Friday, 13 July 2007

Data Integration Challenge – Capturing Changes

0 comments:

Post a Comment

Social Icons

Connect Us

MBA Admission 2018

Events and News

My Favorite Links

Blog Archive

My Blog List

Popular Posts

Mamta @ Twitter

About Me