Pages

Ads 468x60px

Labels

Friday 13 July 2007

Data Integration Challenge – Capturing Changes


When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.

There are two steps to it
  1. Pull the incremental data from the source file or table

  2. Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete

Step 1: Pull the incremental data from the source file or table
If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps
  1. While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
  2. Use this persistent variable value as a filter in the next day to pull the incremental data from the source table

Step 2: Determine the impact of the record on target table as Insert/Update/ Delete 
Following are the scenarios that we would face and the suggested approach
  1. Data file has only incremental data from Step 1 or the source itself provide only incremental data

    • do a lookup on the target table and determine whether it’s a new record or an existing record
    • if an existing record then compare the required fields to determine whether it’s an updated record
    • have a process to find the aged records in the target table and do a clean up for ‘deletes’

  2. Data file has full complete data because no audit columns are present

    • The data is of higher

      • have a back up of the previously received file
      • perform a comparison of the current file and prior file; create a ‘change file’ by determining the inserts, updates and deletes. Ensure both the ‘current’ and ‘prior’ file are sorted by key fields
      • have a process that reads the ‘change file’ and loads the data into the target table
      • based on the ‘change file’ volume, we could decide whether to do a ‘truncate & load’
    • The data is of lower volume

      • do a lookup on the target table and determine whether it’s a new record or an existing record
      • if an existing record then compare the required fields to determine whether it’s an updated record
      • have a process to find the aged records in the target table and do a clean up or delete


Friday 6 July 2007

Business Intelligence Utopia – Dream to Reality: Key Enablers


In the last post, I discussed my view of BI Utopia in which information is available to all stakeholders at the right time, in the right format enabling them to make actionable decisions at both strategic & operational levels. Having said that, the BI street is not paved with gold.

I consider the following key enablers as pre-requisites to achieve true ‘Information Democracy’ in an enterprise. The “Power of Ten” includes:
  1. Proliferation of agile, modular & robust transaction processing systems.

  2. Real Time Data Integration Components

  3. Strong Data Governance structure

  4. Service Oriented Architecture

  5. Extensible Business centric Data Models

  6. Flexible business rules repositories surrounded by clean metadata/reference data environments

  7. Ability to integrate unstructured information into the BI architectural landscape

  8. Guided context-sensitive, user-oriented analytics

  9. Analytical models powered by Simulations

  10. Closed loop Business Intelligence Utopia

Each of these units comprising the “Power of Ten” is a fascinating topic on its own. We will drill-down and focus on some of the salient features of each of these areas in the coming weeks.

Friday 29 June 2007

Data Integration Challenge – Facts Arrive Earlier than Dimension


The fact transactions that come in earlier than the dimension (master) records are not bad data, such fact records needs to be handled in our ETL process as a special case. Such situations of facts coming in before dimensions can occur quite commonly like in case of a customer opening a bank account and his transactions starting to flow into the data warehouse immediately.
But the customer id creation process from the Customer Reconciliation System can get delayed and hence the customer data would reach the data warehouse after few days.
How do we handle this scenario differs based on the business process being addressed, there could be two different requirements
  • Make the fact available and report under “In Process” category; commonly followed in financial reporting systems to enable reconciliation
  • Make the fact available only when the dimension is present,; commonly followed in status reporting systems
Requirement 1: Make the fact available and report under “In Process” category
For this requirement follow the below steps
  1. Insert into the dimension table a record that represents a default or ‘In Process’ status like in case of the banking example the Customer Dimension would have a ‘default record’ inserted that represents the information that the customer detail has not yet arrived

  2. In the ETL process while populating the Fact table, for the transactions that do not have a corresponding entry in the Dimension table, assign a default Dimension key and insert into the Fact. In the same process insert the Dimensions Lookup values into a ‘temporary’ or ‘error’ table

  3. Build an ETL process that checks the new records inserted into the Dimension table, queries the temporary table and identifies the records in facts for which the dimension key has to be updated and updates the respective fact’s dimension key

Requirement 2: Make the fact available only when the dimension is present
For this requirement follow the below steps
  1. Build an ETL process that populates the fact into a staging table

  2. Build an ETL process that pushes only the records that have a dimension value to the data warehouse tables

  3. At the end of ETL process delete all the processed records from the staging table making the other unprocessed records available to be pulled next time

You can read more about  Data Integration Challenge

Monday 25 June 2007

Business Intelligence: Gazing at the Crystal Ball


Circa 2015 – 8 years from now
CEO of a multinational organization enters the corner office overlooking the busy city down below. On flicking a switch near the seat, the wall in front is illuminated with a colorful dashboard, what is known in CEO circles then, as the Rainbow Chart.
The Rainbow Chart is the CEO’s lifeline as it gives a snapshot of the current business position (the left portion) and also figures/colors that serves as a premonition of the company’s future (the right portion).
The current state/left portion of the dashboard, on closer examination, reveals 4 sub-parts. On the extreme left is the Balance Sheet of the business and next to it is the Income statement. The Income statement has more colors that are changing dynamically as compared to the Balance sheet. Each line item has links to it, using which the CEO can drill down further to specific geographies, business units and even further to individual operating units. The third part has the cash flow details (the colors are changing far more rapidly here) and the fourth one gives the details on inventory, raw materials position and other operational details.
The business future state/right portion of the dashboard has a lot of numbers that can be categorized into two. The first category is specific to the business – Sales in pipeline, Revenue & Cost projections, Top 5 initiatives, Strategy Maps etc. and the second category are the macroeconomic indicators across the world. At the bottom of the dashboard is a stock ticker (what else?) with the company’s stock prices shown in bold.
All these numbers & colors change in real-time and the CEO can drill up/down/across/through all the line items. Similar such dashboards are present across the organization and each one covers details that are relevant for the person’s level and position in the company.
This in essence is the real promise of BI.
Whether it happens in 2015 or earlier (hopefully not later!) can be speculated but the focus of the next few blogs from my side will zero-in on some of the pre-requisites for such a scenario – The  Business Intelligence Utopia!

Business Intelligence @ Crossroads


Business Intelligence (BI) is well & truly at the crossroads and so are BI practitioners like me. On one hand there is tremendous improvement in BI tools & techniques almost on a daily basis but on the other hand there is still a big expectation gap among business users on Business Intelligence’s usage/value to drive core business decisions.

This ensures that every Business Intelligence (BI) practitioner develops a ’split’ personality – a la Jekyll and Hyde, getting fascinated by the awesome power of databases, smart techniques in data integration tools etc. and the very next moment getting into trouble with a business user on why ‘that’ particular metric cannot be captured in an analytical report.
For the BI technologists, there is never going to be a dull moment in the near future. With all the big product vendors like Microsoft, Oracle, SAP etc. throwing their might behind BI and with all the specialty BI product vendors showing no signs of slowing down, just get ready to join the big swinging party.
For the business users, there is still the promise of BI that is very enticing – ‘Data to Information to Knowledge to Actions that drive business decisions’. But they are not giving the verdict as of now. Operational folks are really not getting anything out of BI right now (wait for BI 2.0) and the strategic thinkers are not completely satisfied with what they get to see.
The techno-functional managers, the split personality types are the ones in the middle trying to grapple with increasing complexity on the technology side and the ever increasing clamor for insights from the business side.
Take sides right away – there is more coming from this space on the fascinating world of Business Intelligence.