When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.
There are two steps to it
- Pull the incremental data from the source file or table
- Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete
Step 1: Pull the incremental data from the source file or table
If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps
- While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
- Use this persistent variable value as a filter in the next day to pull the incremental data from the source table
Step 2: Determine the impact of the record on target table as Insert/Update/ Delete
Following are the scenarios that we would face and the suggested approach
Data file has only incremental data from Step 1 or the source itself provide only incremental data
do a lookup on the target table and determine whether it’s a new record or an existing record
if an existing record then compare the required fields to determine whether it’s an updated record
have a process to find the aged records in the target table and do a clean up for ‘deletes’
Data file has full complete data because no audit columns are present
0 comments:
Post a Comment