Pages

Ads 468x60px

Labels

Showing posts with label Business Intelligence Solutions. Show all posts
Showing posts with label Business Intelligence Solutions. Show all posts

Thursday, 15 May 2008

Data Integration Challenge – Storing Timestamps

Storing timestamps along with a record indicating its new arrival or a change in its value is a must in a data warehouse. We always take it for granted, adding timestamp fields to table structures and tending to miss that the amount of storage space a timestamp field can occupy is huge, the storage occupied by timestamp is almost double against a integer data type in many databases like SQL Server, Oracle and if we have two fields one as insert timestamp and other field as update timestamp then the storage spaced required gets doubled. There are many instances where we could avoid using timestamps especially when the timestamps are being used for primarily for determining the incremental records or being stored just for audit purpose.
How to effectively manage the data storage and also leverage the benefit of a timestamp field?
One way of managing the storage of timestamp field is by introducing a process id field and a process table. Following are the steps involved in applying this method in table structures and as well as part of the ETL process.Data Structure
  1. Consider a table name PAYMENT with two fields with timestamp data type like INSERT_TIMESTAMP and UPDATE_TIEMSTAMP used for capturing the changes for every present in the table
  2. Create a table named PROCESS_TABLE with columns PROCESS_NAME Char(25), PROCESS_ID Integer and PROCESS_TIMESTAMP Timestamp
  3. Now drop the fields of the TIMESTAMP data type from table PAYMENT
  4. Create two fields of integer data type in the table PAYMENT like INSERT_PROCESS_ID and UPDATE_PROCESS_ID
  5. These newly created id fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID would be logically linked with the table PROCESS_NAME and its field PROCESS_ID
ETL Process
  1. Let us consider an ETL process called ‘Payment Process’ that loads data into the table PAYMENT
  2. Now create a pre-process which would run before the ‘payment process’, in the pre-process build the logic by which a record is inserted with the values like (‘payment process’, SEQUNCE Number, current timestamp) into the PAYMENT table. The PROCESS_ID in the payment table could be defined as a database sequence function.
  3. Pass the current_prcoess_id from pre-process step to the ‘payment process’ ETL process
  4. In the ‘payment process’ if a record is to inserted into the PAYMENT table then the current_prcoess_id value is set to both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID else if a record is getting updated in the PAYMENT table then the current_process_id value is set to only the column UPDATE_PROCESS_ID
  5. So now the timestamp values for the records inserted or updated in the table PAYMENT can be picked from the PROCESS_TABLE by joining by the PROCESS_ID with the INSERT_PROCESS_ID and UPDATE_PROCESS_ID columns of the PAYMENT tableBenefits
  6.  
  • The fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID occupy less space when compared to the timestamp fields
  • Both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID are Index friendly
  • Its easier to handle these process id fields in terms picking the records for determining the incremental changes or for any audit reporting.
Read More about Data Integration

Friday, 29 February 2008

BI Strategy – Approach based on First Principles

Business Intelligence Strategy definition is typically the first step in an organization’s endeavor to implement BI (Business Intelligence). This phase is very crucial as the overall execution direction hinges on decisions taken in this stage.
The precise approach to the BI Strategy definition includes the following steps:
  1. Business Area Identification - Identify and prioritize the business area(s) for which BI is considered. Ex: Human Resource Analytics, Supply Chain Analytics, Enterprise Performance Analytics etc.
  2. Process Mapping Document - Once the business area is identified, map out the individual processes involved in that particular domain. This can be a simple flow-chart that shows the entry and exit criteria for each sub-process.
  3. Business Questions Enumeration – Based on the subject areas involved in the business domain, enumerate the list of questions that are to be answered by the analytical layer.
  4. Data Elements Segregation – For each of the process steps, identify the data elements. These data elements, after subsequent validation (in conjunction with business questions) would translate into dimensions and facts during the data modeling stage.
  5. Data Visualization – Develop a prototype (set of screenshots) on how the data would be visualized for each business question. Business Analysts and domain experts are typically involved at this stage.
  6. BI Architecture Synopsis – At a fundamental level, BI architecture is fairly straightforward. The architecture is almost always a combination of the following processes: Extraction (E), Transformation (T), Loading (L), Cubing (C), and Analyze (Z). The number of layers, type of reporting etc. are a combination of ETLCZ components. Ex: ETLZ, ETLTLCZ, ELTZ, ELCZ are some options for BI architecture definition.
  7. Next Steps Document – The ‘Next Steps’ document would list down the other requirements of / from the analytical infrastructure. These can be points around Tool Evaluation, User profiles, Data volumes, Performance considerations, etc. Each of these requirements would translate to an assessment to be carried out before the actual construction begins.
The most common mistake is to start thinking about technology aspects before the actual business requirement is finalized. A precise definition of business questions goes a long way in designing a scalable and robust BI infrastructure. 
Read More about  BI Strategy

Monday, 28 January 2008

Business Intelligence and Six Sigma

I just finished a Six Sigma project and was left wondering as to why BI practitioners are not using more of that Six Sigma power in Business Intelligence. Let me delve on this subject a bit more.
The Six Sigma project that I just completed was on “Developing a Function Point based estimation model for ETL loads”. Essentially, I was facing a lot of problems in estimating the effort for ETL (in this case, Informatica) loads that led to “Effort variances” beyond specified limits. So we kicked off a Six Sigma project that had the following DMAIC phases:
1. Define – Definition of the problem (Ex: Estimation process is out of whack)
2. Measure – We measured the effort variances before the start of the project and also set ourselves a target of where it should be.
3. Analyze – Analyzed the root-cause of the problem. The solution was to let go of the complexity based estimation that was done initially and to adapt Function points. In fact, this FP based estimation model was presented at the International Software Estimation Colloquium last year and won the Runner-up prize (http://www.qaiasia.com/Conferences/sec2007/leadership.htm)
4. Improve – Based on a pilot within the project, the Function points based linear regression model was arrived at and the team was educated on the estimation process. The improvements to the estimation process (effort variances) were measured on a regular basis.
5. Control – Periodic checks to ensure the institutionalization of the process and also fine-tune wherever necessary.
That in a nut-shell is what my Six Sigma project was all about. Basically, Six Sigma tries to improve process efficiencies by following the phases mentioned above.
Now let’s see the connection to Business Intelligence. Analytics at this stage of evolution (in majority of organizations) are being used to find the improvement area at a given point of time. The improvement area can be a problem (Ex: Trend chart showing that the Sales in the West region is dropping by 10% every quarter for the last 3 quarters) or an opportunity (Ex: Market potential for a product is huge and our share is small). BI is reasonably good at providing this information and it will only get better. But BI by itself does not enforce the process / execution rigor that is required for successful organizations.
To summarize, Six Sigma needs an improvement opportunity as the starting point for it to unleash its power to improve processes. BI generates lot of these opportunities with its DW/Reporting/Analytics components but does not enforce the process implementation rigor. I feel that there is lot of synergy in bringing both together – Six Sigma, the left hand and BI, the right hand when brought together can earn a lot of claps in the quest to create learning, performing organizations.
Just to sample the power of Six Sigma techniques, please take a look at the following link:http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html, which illustrates the use of control charts (one of Six Sigma’s potent tools) in metrics / KPI management. Fascinating!
Agree / Not Agree, Have more thoughts on this topic, this post is good / rubbish, for anything – Please do send in your comments.
Information Nugget:Having talked about execution rigor, let me recommend one of the best books I have read in that area. “Execution – The Discipline of Getting Things Done” by Larry Bossidy and Ram Charan (http://www.amazon.com/Execution-Discipline-Getting-Things-Done/dp/0609610570)