Pages

Ads 468x60px

Labels

Friday, 13 July 2007

Data Integration Challenge – Capturing Changes


When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.

There are two steps to it
  1. Pull the incremental data from the source file or table

  2. Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete

Step 1: Pull the incremental data from the source file or table
If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps
  1. While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
  2. Use this persistent variable value as a filter in the next day to pull the incremental data from the source table

Step 2: Determine the impact of the record on target table as Insert/Update/ Delete 
Following are the scenarios that we would face and the suggested approach
  1. Data file has only incremental data from Step 1 or the source itself provide only incremental data

    • do a lookup on the target table and determine whether it’s a new record or an existing record
    • if an existing record then compare the required fields to determine whether it’s an updated record
    • have a process to find the aged records in the target table and do a clean up for ‘deletes’

  2. Data file has full complete data because no audit columns are present

    • The data is of higher

      • have a back up of the previously received file
      • perform a comparison of the current file and prior file; create a ‘change file’ by determining the inserts, updates and deletes. Ensure both the ‘current’ and ‘prior’ file are sorted by key fields
      • have a process that reads the ‘change file’ and loads the data into the target table
      • based on the ‘change file’ volume, we could decide whether to do a ‘truncate & load’
    • The data is of lower volume

      • do a lookup on the target table and determine whether it’s a new record or an existing record
      • if an existing record then compare the required fields to determine whether it’s an updated record
      • have a process to find the aged records in the target table and do a clean up or delete


Friday, 6 July 2007

Business Intelligence Utopia – Dream to Reality: Key Enablers


In the last post, I discussed my view of BI Utopia in which information is available to all stakeholders at the right time, in the right format enabling them to make actionable decisions at both strategic & operational levels. Having said that, the BI street is not paved with gold.

I consider the following key enablers as pre-requisites to achieve true ‘Information Democracy’ in an enterprise. The “Power of Ten” includes:
  1. Proliferation of agile, modular & robust transaction processing systems.

  2. Real Time Data Integration Components

  3. Strong Data Governance structure

  4. Service Oriented Architecture

  5. Extensible Business centric Data Models

  6. Flexible business rules repositories surrounded by clean metadata/reference data environments

  7. Ability to integrate unstructured information into the BI architectural landscape

  8. Guided context-sensitive, user-oriented analytics

  9. Analytical models powered by Simulations

  10. Closed loop Business Intelligence Utopia

Each of these units comprising the “Power of Ten” is a fascinating topic on its own. We will drill-down and focus on some of the salient features of each of these areas in the coming weeks.