Pages

Ads 468x60px

Labels

Showing posts with label Business Intelligence Consulting. Show all posts
Showing posts with label Business Intelligence Consulting. Show all posts

Monday 9 June 2008

Hybrid OLAP – The Future of Information Delivery

As I get to see more Enterprise BI initiatives, it is becoming increasingly clear (atleast to me!) that when it comes to information dissemination, Hybrid Online Analytical Processing (HOLAP) is the way to go. Let me explain my position here.
As you might be aware, Relational (ROLAP), Multi-dimensional (MOLAP) and Hybrid OLAP (HOLAP) are the 3 modes of information delivery for BI systems. In an ROLAP environment, the data is stored in a relational structure and is accessed through a semantic layer (usually!). MOLAP on the other hand stores data in proprietary format providing the notion of a multi-dimensional cube to users. HOLAP combines the power of both ROLAP and MOLAP systems and with the rapid improvements made by BI tool vendors, seems to have finally arrived on the scene.
In my mind, the argument for subscribing to the HOLAP paradigm goes back to the “classic” article
by Ralph Kimball on different types of fact table grains. According to him, there are 3 types of fact tables – Transaction grained, Periodic snapshot, Accumulating snapshot and that atleast 2 of them are required to model a business situation completely. From an analytical standpoint, this means that operational data has to be analyzed along with summarized data (snapshots) for business users to take informed decisions.
Traditionally, the BI world has handled this problem in 2 ways:
1) Build everything on the ROLAP architecture. Handle the summarization either on the fly or thro’ summarized reporting tables at the database level. This is not a very elegant solution as everybody in the organization (even those analysts working with summarized information) gets penalized for the slow performance of SQL queries issued against the relational database through the semantic layer.
2) Profile users and segregate operational analysts from strategic analysts. Operational users are provided ROLAP tools while business users working primarily with summarized information are provided their “own” cubes (MOLAP) for high-performance analytics.
Both solutions are rapidly becoming passé. In many organizations now, business users wants to look at summarized information and based on what they see, needs the facility to drill down to granular level information. A good example is the case of analyzing Ledger information (Income statement & Balance Sheet) and then drilling down to Journal entries as required. All this drilling down has to happen through a common interface – either an independent BI Tool or an enterprise portal with an underlying OLAP engine.
This is the world of HOLAP and it is here to stay. The technology improvement that is making this possible is the relatively new wonder-kid, XMLA (XML for Analysis). More about XMLA in my subsequent posts.
As an example of HOLAP architecture, you can take a look at this link
to understand the integration of Essbase cubes (MOLAP at its best) with OBIEE (Siebel Analytics – ROLAP platform) to provide a common semantic model for end-user analytics.
Information Nugget: If you are interested in Oracle Business Intelligence, please do stop by at http://www.rittmanmead.com/blog/ to read his blogs. The articles are very informative and thoroughly practical.
Thanks for reading. Please do share your thoughts.
Read More Hybrid OLAP

Monday 2 June 2008

Data Integration Challenge – Parent-Child Record Sets, Child Updates

There are certain special set of records like Loan & its Guarantor details in a banking system, each Loan record can have one or more Guarantor record. In a similar way for a services based industry Contracts & its contract Components exist, these sets can be called as parent-child records where in for one parent record like Loan we might have zero to many child records of Guarantor.
During data modeling we would have one table for the parent level record and its attribute, another separate table for the child records and its attributes.
As part of the data load process, have seen situations where a complete refresh (delete & insert) of the Child records is required whenever there is a change in certain attributes of a parent record. This requirement can be implemented in different ways; here we would look at one of the best ways to get this accomplished.
The following steps would be involved in the ETL process
  1. Read the parent-child record
  2. Determine if a change in the incoming parent record
  3. If a change has occurred then issue a delete to the particular set of child records
  4. Write corresponding incoming new child records into a flat file
  5. Once step 1 to 4 is completed for all parent records have another ETL flow that would bulk load the records from the flat file to the child table
We didn’t issue an insert with a new incoming child record after the delete because the deleted record wouldn’t have got committed and an insert can lock the table. We can issue a commit after every delete and then follow it with an insert but having a commit after each delete would be costlier, writing the inserts to the files handles this situation perfectly.
Also an option to insert first with a different key and then delete the older records would be costlier in terms of locating the records that needs to the deleted.
We could have also looked at the option of updating the records in place of deletion then we would at times end up having dead records in the child tables; the records that have been deleted in the source would still exist in the target child table, also updating a record can disturb contagious memory, deletion and insert would have the pages intact.
Read More about  Data Integration


Monday 19 May 2008

Let’s talk EPM – Part 2 on Metrics Profiling

In my earlier post on Enterprise Performance Management (EPM), I had enumerated the six steps of a practical EPM strategy in an organization. They were:
  1. Business Process Maps – Understand the business process
  2. Metrics Identification – Get hold of the metrics
  3. Metrics Profiling – Understand the metrics in depth
  4. Metrics Maps – Understand the cause and effect relationships between metrics
  5. Metrics Visualization – Implementation of Metric Maps on BI Tools
  6. Watch and Improve – Monitor Metrics and Improve business process as required
It is important to realize that building a data warehouse (enterprise wide) or data mart (functional area wise) or simply an integrated, subject-oriented data repository (without getting lost in semantics!) is implicit in the set of steps outlined above.
Steps 1 and 2 (Business Process and Metrics identification) are self-explanatory. Though getting hold of the right metrics is easier said than done, it is fairly well understood that the measures/metrics selected for analysis should align itself with the organization’s mission, business model and value creation aspects.
Step 3 – Metrics Profiling, in my opinion, is the step often missed out in EPM implementations and arguably is a major cause of failures in such programs. Metrics Profiling stated simply is a way of understanding your metrics in depth. Given below is a sample template for profiling your metrics and can be customized for each organization.
Profiling Parameters:
1. Metric Name – Name of the metric
2.Metric Definition – Brief definition of the metric
3.Metric Type – Is it a ratio, absolute number, trended value, etc.
4.Sources of data – Identify the source of data for the metric and the owners
5.Application – Brief description of how the metric helps in managing the business better
6.Potentially Affected Metrics – Identify the other metrics that are impacted (positive or negative) by this metric.
7.Example – Provide an example of metrics usage. (For example: ABC Computers released three new product lines during the last 12 months, generating $15 million in new revenue out of total annual revenue of $125 million. New Products Index = 15 ÷ 125 = 12%)
Metrics Profiling is a very important step in the implementation of enterprise wide performance
management system. I will discuss the other aspects of EPM in my subsequent posts.
Thanks for reading .

Wednesday 9 April 2008

Data Integration Challenge – Initial Load – II

The other challenges during Initial load are the
System Performance
Catching Up
System Performance is always a challenge during initial load especially when many years of history data are to be loaded; there is an increase in the usage of system resources because of huge data load which wouldn’t happen during regular incremental load. Some of the ways of handling the system performance issue during initial load are
Group the data loads by filters like years or customer and load the related data in chunks. We could load data for the month Jan, Feb, Mar or load the customers region wise from NY, followed by NJ etc. Such grouping of records for loading eliminates data surge and as well provides better way to perform data validation
As the data gets loaded into the warehouse the data required for lookup from the warehouse becomes huge, we need to decide the required lookup records based on the incoming data. For example if the data in the warehouse has data related to all the regions North, South, East, West and the incoming data currently has only North data then we need to have a override filter and access only the data pertaining to North  from the warehouse
We could plan and increase the available memory for ETL server for a temporary period
Avoid sequential inserts, write data to a file, sort and bulk load
Determine and plan for more disk space requirement for initial load files that are extracted and provided by the source systems is an interesting problem where in the warehouse is not able to cope up and able to deliver latest data (or 1 day old) as in the source system. This problem would be more often due to the ETL performance issues, where even before the initial data is successfully loaded and verified in the DW, additional set of new records would have come from the source which the DW is not able to catch up.
Though at times some kind of code tuning, running things in parallel or hardware upgrade (usually a costly one at a later stage) could resolve such problems, in certain situations this problem could run into an unsolvable state where in the complete ETL architecture has to be re-looked.
One other way to manage such situations is to have the daily process of loading current data to proceed independently and in parallel through a separate of processes bring in the history data on a regular basis, in certain scenarios we might need to build a process that would run and sync up the current with the old data especially the aggregate data if any designed in the data model.
Read More about  

Tuesday 18 March 2008

Data Integration Challenge – Initial Load – I


In a data warehouse all tables usually go through two phases of data load process they are the initial load and the incremental load. ‘History Load’ or ‘Initial Seeding/Load’ involves a one time load of the source transaction system data of the past years into the Data Management System. The process of adding only the new records (updations or insertions) to the data warehouse tables either daily or on a predefined frequency is called ‘Incremental Load‘. Also certain tables that are of small in size and largely independent set of tables which receives full data (current data + history data) as input would be loaded by means of a ‘Full Refresh‘; this involves complete delete and reload of data.

Especially code tables would usually under go a one time initial load and may not be required for a regular incremental load, incremental load is common for fact tables. Initial Load of a data warehouse system is quite a challenge in terms of getting it completed successfully within a planned timeframe. Some of the surprises or challenges faced in completing the history load are
  1. Handling invalid records
  2. Data Reconciliation
  3. System performance
  4. Catching up
Handling Invalid Records:
The occurrence of an invalid record becomes much more prominent as we process the history data which was collected into the source system much long before and the history data might not fit into the current business rules. The records from a source system can become invalid in the data warehouse due to multiple reasons like invalid domain value for a column or null value for a non null-able field or aggregate data not matching to the detail data. The ways of handling this problem effectively are
  • Determine the years of data to be loaded into the data warehouse very initially and ensure that the data profiling is performed on the sample data for all the years that has to be loaded. This ensures that most of the rules of data validation are identified up front and built as part of ETL process. In certain situations we may have to build separate data validation and transformation logic based on the year and data
  • Especially in situations like re-platforming or migrating the existing data warehouse to a new platform, even before running the data through regular ETL process we might need to load the old data into a data validation (staging) area through which the data analysis is done, cleaned and then data loaded into the data warehouse through regular ETL process
  • Design the ETL process to divert all the key values of the invalid records to a separate set of tables. In some sites we see that the customer just needs to be aware for the error records and fine if these records doesn’t get aligned into the current warehouse, but at times the invalid records are corrected and reloaded
  • For certain scenarios like aggregate data not matching to detail data, though we might always derive aggregate from detail data at times we might also generate detail data to match aggregate data
Data Reconciliation:
Once the initial load from the source system into the data warehouse has been completed we have to validate to ensure that the data has been moved in correctly.
  • Having a means of loading records in groups separated by years or any logical grouping like by customer or product would give a better control in terms of data validation. In general data validations performed are like count and sum should be tied to certain business specific validation rules like all customers from region ‘A’ belonging to division ‘1’ in the source should be classified under division ‘3’ in the current warehouse.
  • All the validations that needs to be performed after the initial load for each data group has to prepared and verified with the business team, many a times the data is validated by the business as a adhoc query process though the same can be verified by an automated ETL process by the data warehouse team
We shall discuss further on the other challenges in Part II.
Read More About: Data Integration

Friday 29 February 2008

BI Strategy – Approach based on First Principles

Business Intelligence Strategy definition is typically the first step in an organization’s endeavor to implement BI (Business Intelligence). This phase is very crucial as the overall execution direction hinges on decisions taken in this stage.
The precise approach to the BI Strategy definition includes the following steps:
  1. Business Area Identification - Identify and prioritize the business area(s) for which BI is considered. Ex: Human Resource Analytics, Supply Chain Analytics, Enterprise Performance Analytics etc.
  2. Process Mapping Document - Once the business area is identified, map out the individual processes involved in that particular domain. This can be a simple flow-chart that shows the entry and exit criteria for each sub-process.
  3. Business Questions Enumeration – Based on the subject areas involved in the business domain, enumerate the list of questions that are to be answered by the analytical layer.
  4. Data Elements Segregation – For each of the process steps, identify the data elements. These data elements, after subsequent validation (in conjunction with business questions) would translate into dimensions and facts during the data modeling stage.
  5. Data Visualization – Develop a prototype (set of screenshots) on how the data would be visualized for each business question. Business Analysts and domain experts are typically involved at this stage.
  6. BI Architecture Synopsis – At a fundamental level, BI architecture is fairly straightforward. The architecture is almost always a combination of the following processes: Extraction (E), Transformation (T), Loading (L), Cubing (C), and Analyze (Z). The number of layers, type of reporting etc. are a combination of ETLCZ components. Ex: ETLZ, ETLTLCZ, ELTZ, ELCZ are some options for BI architecture definition.
  7. Next Steps Document – The ‘Next Steps’ document would list down the other requirements of / from the analytical infrastructure. These can be points around Tool Evaluation, User profiles, Data volumes, Performance considerations, etc. Each of these requirements would translate to an assessment to be carried out before the actual construction begins.
The most common mistake is to start thinking about technology aspects before the actual business requirement is finalized. A precise definition of business questions goes a long way in designing a scalable and robust BI infrastructure. 
Read More about  BI Strategy

Wednesday 16 January 2008

Linking BI Technology, Process and People – A Theory of Constraints (TOC) View


With the advent of a new year, let me do a recap of what I have discussed through my 15 odd posts in 2007 and also set the direction for my thoughts in 2008.
I started with the concept (http://blogs.hexaware.com/business_intelligence/2007/06/business-intell.html) of BI Utopia in which information is available to all stakeholders at the right time, right time and in the right format. The bottomline is to help organizations compete on analytics in the marketplace. With that concept as the starting point, I explored some technology enablers like Real Time Data Integration, Data Modeling, Service Oriented architecture etc. and also some implementation enablers like Agile Framework, Calibration of DW systems and Function points based estimation for DW. In my post on Data Governance, I introduced the position of a CDO (Chief Data Officer) to drive home the point that nothing is possible (atleast in BI) without people!
To me, BI is about 3 things – Technology, Process, People. I consider these three as the holy triumvirate for successful implementation of Business Intelligence in any organization – Not only are the individual areas important by itself but the most important thing is the link between these 3 areas. Organizations that are serious about ‘Analytics’ should continuously elevate their technology, process & people capability and more importantly strengthen the link between them – afterall, any business endeavor is only as good as its weakest link.
Theory of Constraints (http://en.wikipedia.org/wiki/Theory_of_Constraints) does offer a perspective, which I feel is really useful for BI practitioners. I will explore more of this in my subsequent posts.
My direction in 2008 for posts on this blog are:
  1. Continue with my thoughts on Business Intelligence along Technology, Process and People dimensions

  2. Provide a “Theory of Constraints” based view of BI with focus on strengthening the link between the 3 dimensions mentioned above.

Almost every interesting business area – Six Sigma, Balanced Scorecard, System Dynamics, Business Modeling, Enterprise Risk, Competitive Intelligence, etc. has its relationship with BI and we will see more of this in 2008.
Please do keep reading and share your thoughts as well Business Intelligence

Thursday 3 January 2008

BI Appliances


What is a BI Appliance?
If a data warehouse class database product or a reporting product or a data integration product or an all-in-one software package is pre installed and available in a preconfigured hardware box, then such a “hardware + software” box is called a ‘Business Intelligence'  Appliance’. The very purpose of an appliance model is to cover the underlying software components complexity and intricacies and make it simple like operating a TV system.
How an Appliance Model evolved?
As businesses gathered huge data, the demand for faster and better ways of analyzing data increased, the data warehouse as a software technology got evolved; there have been continuous efforts to build software systems that are cognizant of data warehouse environments.


We have seen IBM and Oracle releasing their data warehouse specific database editions
We seen the growth of data warehouse specific databases like RedBrick(now part of IBM), Teradata, Greenplum…
We have seen simple list reporting tools getting into proprietary data structures cubes and the emergence of acronyms MOLAP, HOLAP, ROLAP, DOLAP
We had a very new software market created for ETL and EII products
We have seen more new software applications related to BI being created BAM, CPM, Metadata Management, Data Discovery and lot more getting defined every day into the market….

As many organizations started setting up its BI infrastructure or enhanced its existing BI environment with different BI software packages they needed, they also imbibed different platforms and hardware, the maintenance of these became frightening. Getting started with a BI project by itself became a bigger project; we needed to spend sufficient time not just on choosing the right set of BI products but also on the supported hardware, dependent software packages and the platform. No BI vendor currently addresses the complete stack of BI system needs and this has been the driving factor for more acquisitions.
Products like Nettezza (Data base Appliance), CastIron (ETL Appliance) came up with their ‘software in a box’ concept, where we can buy or rent preconfigured ‘hardware + software’ boxes which in a way addresses the need of ‘ready to use’ BI market. Many of these boxes have Linux, open source databases, web server, message queues and proprietary software.
The Appliance based model is not new, IBM has been renting its ‘mainframe + software’ for decades. IBM has addressed the BI market with its ‘Balanced Warehouse’; a preconfigured ‘hardware + software’, its OS can vary from Windows – AIX – Linux with DB2 as database and data reporting can vary from DB2 Cubes – Crystal – Business Objects. HP in a similar way has come out with its Neoview platform which is a revitalized version of NonStop SQL database and NonStop OS.
The need of a CIO has been always ways to shorten the application deployment cycle and reduce the maintenance factor of the servers; the Appliance based products meet these KRA of a CIO and are getting accepted widely.
The Future
More Appliances, Focus on Performance:
We would see more BI appliances coming into market; as the Appliance model covers what’s underneath and in many cases the details being not available; the buying focus would be more on what the products deliver rather than what they have inside.
Common Appliance Standards:
Getting best of breed of software and hardware from a single vendor would not happen. We might see both software and hardware vendors defining a set of basic standards among themselves for the Appliance model. New organizations would also evolve similar to “tpc.org” which would define performance standards for appliances. We might see companies similar to DELL coming up which can assemble best of breed components and deliver a packaged BI Appliance.
More Acquisitions: The current  Business Intelligence Market landscape can also be interpreted as
  1. Hardware + Software or Appliance based vendors – HP, IBM
  2. Pure software or Non-Appliance based vendors – Oracle, Microsoft, SAP

Once the current BI software consolidation gets established the next wave of consolidation would be towards companies like Oracle looking for hardware companies to be added to their portfolio.
TechnologyAppliance Products
DatabaseNetezza
Teradata
DATAllegro
Dataupia
Data IntegrationCASTIron
Reporting-DashboardCognos NOW (Celequest LAVA)
Configurable Stack (with third party support)IBM Balanced Warehouse
HP Neoview

Thursday 27 December 2007

“What Management Is” – The crucial link between Business and Intelligence


Let’s for a moment accept the hypothesis that the true intent of Business Intelligence is to help organizations manage their business better. “Better” in this context tends to be a rather elastic adjective as it straddles the entire spectrum of firms using BI for simple management reporting to the other extreme of using BI to ‘Compete on Analytics’ in the marketplace.
“Managing business better” presents the classic question of “What aspects of business can BI help manage better”. The Answer – “Pretty much everything”.
In this post, I would like to list down the different business areas that ought to be managed for the better and drill down into the applicability of BI for each of these areas in future posts. The primary reference for my listing is from one of the best management books I have ever read till date – “What Management Is” by Joan Magretta and Nan Stone. (http://www.amazon.com/What-Management-Works-Everyones-Business/dp/0743203186). This book really helps in drawing the boundaries around management concepts and for BI practitioners, like me, shows the direction for the evolution and business applicability of BI.
BI practitioners need to understand the following business areas:
  1. Value Creation – BI can help in providing the critical “Outside-in” perspective

  2. Business Model – Is this the right business to be in?

  3. Strategy – Validation and tuning of Strategy thro’ BIsiness Intelligence

  4. Organization Boundaries – BI can help solve the Build vs Buy conundrum

  5. Numbers in Business – Really the sweetspot for BI applications

  6. Mission and Measures – Connecting the company’s mission with the measures

  7. Innovation and Uncertainty – Domain of Predictive Analytics & its ilk

  8. Focus – Realm of Pareto’s Law vis-à-vis the more recent “Long-Tail” phenomenon

  9. Managing People – Human Resource Analytics is one of the most happening analytics application areas at this point in time.
Bit of marketing here – after all, this is a corporate blog – My company Hexaware is a specialty provider of HR Analytics solutions. Please do visit –http://www.hexaware.com/new_hranalytic.htm for more information
To me, the list above presents the most comprehensive high-level thought process when confronted with implementation of BI in organizations. In my consulting engagements, the litmus test is to really see whether the BI strategy covers the different aspects of business as noted above 
– “More the coverage better is the BI vision”.
Information Nugget
I was quite fascinated by the range of analytical apps available “On-Demand” at http://www.salesforce.com/appexchange/category_list.jsp?NavCode__c=a0130000006P6IoAAK-1 . I personally feel that “On-Demand” does have the potential to disruptively change the way BI services have been delivered customers. More on that later!
The crucial link between Business and Intelligence
Have a Merry Christmas and a Happy New Year 2008!

Tuesday 18 December 2007

Data Integration Challenge – Building Dynamic DI Systems – II


Following are the design aspects towards getting a DI system dynamic
  1. Avoiding hard references, usage of parameter variables

  2. Usage of lookup tables for code conversion

  3. Setting and managing threshold value through tables

  4. Segregating data processing logics into common reusable components

  5. Ensuring that the required processes are controllable by the Business team with the required checks built in

We had defined the first two aspects in the earlier writing, let us look at the scenarios and approach for the other three items
Setting and managing threshold values through tables
In data validation process we also perform verification on the incoming data in terms of count or sum of a variable, in this case the validity of the count or sum derived is verified against a pre defined number usually called the ‘Threshold Value’. Some of the typical such validation are listed below
  1. The number of new accounts created should not be more than 10% (Threshold Value) of the total records

  2. The number of records received today and the number of records received yesterday can not vary by more than 250 records

  3. The sum of the credit amount should not be greater than the 100000

This threshold value differs across data sources but in many cases the metric to be derived would be similar across the data sources. We can get these ‘threshold values’ into a relational table and integrate this ‘threshold’ table into the Data Integration Challenge process as a lookup table, this enables the same threshold based data validation code to implemented across different data sources and also apply the specific data source threshold value.
Segregating Data Processing Logics into Common Reusable Components
Having many reusable components in a system by itself makes a DI system dynamic or adaptable, the reason being that reusable components work on the basic aspect of parameterization of inputs and outputs of an existing process and parameterization is a key component to get a DI system dynamic. Some of the key characteristics to look for in a DI system that would help carve out a reusable component are
  1. Multiple data sources providing data for a particular subject area like HR data coming from different HR systems

  2. Same set of data being shared with multiple downstream systems or a data hub system

  3. Existence of an industry standard format like SWIFT, HIPPA either as source or target

  4. Integration with third party systems or their data like D&B, FairIsaac

  5. Changing data layouts of the incoming data structure

  6. Systems that capture survey data

Ensuring that the required processes are controllable by the Business team with the required checks built in
In many situations we are now seeing requirements where in the business would be providing regular inputs to the IT team of the DI systems, these are the situations where we can design and place the portions of the DI system parameters under the business control. Typical examples of such scenarios are
  • In ‘threshold value’ based data validation, these values would be business driven i.e., ‘threshold table’ can be managed by the business team and they would be able to make changes to the threshold table without code changes and without IT support
  • In many scenarios the invalid data would under go multiple passes and be need to be validated at different passes by the business in terms of starting a BI session, the input from the business could be just starting the process or as well providing input data
  • The data to be pulled out from a warehouse based on a feed from an online application; a typical web service problem-solution
The need for the business team to control or feed the DI systems is common with companies that handle more external data as with market research firms and Software As A Service (SAAS) firms. The web service support from the leading Data Integration vendors plays a major role in full filing these needs.