Pages

Ads 468x60px

Labels

Monday, 25 August 2008

To Build or Buy? – The Answer is ROI

For Business Intelligence project managers, sponsors and decision makers, things are getting lot more interesting (and complicated) with the advent of packaged BI Applications. Packaged BI is not new but this domain has been getting a big push in recent years from all the major enterprise application vendors.
The logic behind Packaged BI looks sound and bullet-proof. It goes like this – The enterprise applications vendors understand the business aspects very well and have handled complexity of a high order. The collective experience over many years have been distilled into creating specific BI solutions (Financials, Supply Chain, Operations, Sales etc.) and these come packaged with data models, pre-built ETL jobs, standardized reports and high-end predictive analytics. For an example, take a look at this blog describing the packaged BI Applications from Oracle.
So what’s the problem – Why can’t everybody buy packaged BI applications and live happily ever after?
It appears that the choice is not so simple. Packaged BI has certain drawbacks some of which are outlined below:
Packaged BI imposes a certain way of capturing business entities and metrics (euphemistically termed best practices), which might go against an organization’s way of doing things.
The pre-packaged data integration jobs (ETL) stays relevant only for a plain-vanilla implementation of enterprise apps.
Customization done to transaction systems would involve customization to pre-packaged ETL jobs and reports that involves considerable effort and is error-prone.
Packaged BI apps come with embedded ETL and Reporting tools that might be different from the already chosen enterprise standard tools.
From my own experience, I have seen that the packaged BI comes with so many entities and attributes for each domain that it appears “bloated” for companies taking a first step into performing analytics for that particular domain.
Ultimately, the current situation is such that, BI decision makers are grappling with the question of “Build or Buy” – Should I build the BI application from scratch or buy one of those packaged applications? One way to overcome this problem is to build a strong ROI (Return on Investment) framework for BI initiatives in your organization. ROI is computed by dividing the Net Present Value of cash flows over a time horizon by the initial investment. The details of ROI computation and Hexaware’s proprietary tool for financial assessments in BI will be the discussed in subsequent blogs. For now, let’s assume that you have computed the ROI for a Build solution and also for a Packaged BI solution. Once this is done, the choice becomes a little clear – If the ROI for Packaged BI solution is better than expected and the organization can manage the typical pains of implementing a packaged solution, then consider the “Buy” option, else look for a “Build” option.
Now here comes the little twist In my experience, I have seen customers looking at a shorter time-horizon where the ROI of a build solution is typically higher and then move onto a buy solution with a longer time-frame in mind. The extra advantage of this approach is that the organization understands its analytical needs much better before implementing a Packaged BI solution. So it is strictly not a “Build vs Buy” question but can also be a “Build and Buy” scenario.
Thanks for reading. Please do share your thoughts.

Thursday, 14 August 2008

End Point in the Business Intelligence Value Chain

An interesting aspect of Business Intelligence is the fact that there
are many end-points possible in a BI Value Chain. Let me explain
a bit here and build a case for creating “Reference Architectures”
in the BI domain.
In my view, there are typically 5 different configurations for the
BI Value Chain that leads to 5 possible end-points. They are:
End Point 1: Reporting and Ad-hoc Analysis
This is the most common type of enterprise BI Landscape.
The objective here is to provide business users with standardized
reports and ad-hoc analysis capabilities to analyze the business.
With that objective in mind, data warehouses and/or data
marts are created as data repositories and semantic layers
for analysis flexibility.
End Point 2: Data Hub or Master Data Repository
This is a scenario where the objective is to consolidate data
and create master data repositories. The consumption of this
master data is typically left to individual consumers to figure
it out for themselves. The complexity in this type of configuration
is more in terms of data quality and governance mechanisms
around the data hub, as the business value increases only if more
systems utilize the data hub.
End Point 3: Source Systems
This configuration indicates a fairly mature landscape where the
feedback loop from the analytical systems to the operational
ones is in place. The concept of Operational BI is built on this
foundation where the data from transaction systems go
through the analytical layers, gets enriched and reaches
its place of origination with the intent of helping business
make better informed transactional decisions.
End Point 4: Data Mining models
This is a configuration that helps organizations
compete on analytics
. Integrated, subject oriented, cleansed data
that is taken out of data warehouses / marts are fed into
data mining models in a seamless fashion. The results obtained
from the data mining exercise are used to optimize business
decisions.
End Point 5: Simulations
Here is a configuration that I haven’t seen in practice but have
a strong feeling would be the future of BI. I have some
experience in working with Simulation tools (Powersim,
Promodel to name a few)
 where the idea is to create a model
of the business with appropriate leads, lags, dependencies
etc. The starting criteria (set of initial parameters) would
typically be fed by a business analyst and the output of
the model would indicate the state of business (or specific
business area being modeled) after a period of time. Given
this context, I think it would be more powerful to have
the simulation models being fed with data from analytical
systems in an automated fashion. Presuming that the simulation
models are built correctly by experts in that particular area,
the output tends to be a better illustration of the future
state of the business than compared to “gut feel” extrapolation.
Outlined above are the 5 different configurations of BI systems.
The logical next step from the technology standpoint is
to publish reference architectures for each of these configurations.
This would help organizations get an idea of the components
involved once they decide on a particular configuration
for their business.
Reference Architectures and Simulations in BI environments
are areas that will be explored more in the subsequent posts.
Thanks for reading. Have a great day!
Read More About Business Intelligence Value Chain

Tuesday, 5 August 2008

Business Intelligence Challenge – Understanding Requirements, User Object Analysis

Let us start with the Law of (BI) Requirements“Requirements can not be created nor destroyed; it can only be transformed from one form to another”. The thought is that in all customer environments the requirements for a BI system are always available in some form or the other. We need to find the ‘base object form’ of the requirement and build upon it for further improvement.
In general data in every transaction system gets analyzed and reported in one way or the other. The BI system is built only to improve that process of analysis to a much easier and sophisticated way. Typical requirements ‘understanding’ has been through the means of Questionnaires, Interviews and Joint Discussions, these kinds of requirements gathering could miss out understanding certain things that the user needs because we might not ask the right questions or the user is not in a good mood during the discussion or the user might just provide details on what he can remember at that point of time. When we are talking about users in thousands and located across globe it becomes much bigger challenge.
The solution to cover all aspects of requirements understanding from a user perspective is by analysis of the objects that a user ‘creates or uses’ in his day today activities, we can call this ‘User Object Analysis’.
A ‘User Object’ is any artifact that a user is creating as part of his data preparation, analysis and reporting, this object could be an Excel, PowerPoint slide, Access database, a Word Document, a notepad or an e-mail.
Following are the steps in ‘User Object Analysis’
  • Collect all the ‘Objects’ from all users, the objects collected can go across years, but the key is to collect all of them which the user feels as relevant and applicable
  • Convert all of the content in each of the ‘User object’ into a relational structure, the conversion process would involve mapping the data in the Objects to its metadata like the business names/elements, tables-columns, username, depart etc
  • Analysis of this collected metadata gives a wider view, enables questioning, makes us understand the needs of the users and enables us to define improvements or provide another perspective to the existing ones
  • Prepare and submit the ‘User Object Analysis’ report highlighting the needs of each user (or user clusters) to get user confirmation
Benefits of User Object Analysis
  • An effective means to understand the needs of a user based on what he does as a daily routine
  • An easy way for the user as he has to just read thru final report for approval and need not work in providing inputs through questionnaire or discussions
  • Easily managed for users in large numbers or multiple locations
  • A good base for us to define improvements for the existing process of analysis
  • Platform to consolidate the needs across multiple users and carve out the user clusters who perform same kind of analysis
  • Enables us to think through the business process and improves business understanding
Next time let us discuss about another perspective to Requirements Understanding called ‘System Object Analysis’.

Tuesday, 22 July 2008

Business Intelligence Landscape Documentation – The Gold Copy

Software systems, in general, need good comprehensive documentation. This need becomes a “must-have” in case of BI systems, as these systems evolve over a period of time. My recommendation (a best practice, if you will) is to create a “Gold Copy” documentation of the enterprise BI landscape. The “Gold Copy” is distinct from the individual project documentation (this would continue to exist) and is updated with appropriate version controls over a period of time.
The “Gold Copy” would comprise of the following documents related to the Business Intelligence environment:
1) Architecture and Environment
This document has two broad sections. The first section explores the physical architecture and then attempts to explore each of the architectural components in more detail. The second section explores the environmental and infrastructure requirements for development, production and operations.
2) Data Sources
This document explores the various source systems from which data is acquired by the datawarehouse. The document attempts to provide a layman’s view of what data is being extracted and maps it to the source system data structures where they data originate from.
3) Data Models
This document builds on the two previous documents. The document attempts to map the source data onto the datawarehouse and the data mart models and provides a high-level picture of the subject orientation.
4) Reporting and OLAP
The document takes a closer look at information delivery processes and technology to the end user from the datawarehouse. The initial section explores the architectural components by way of middleware server software and hardware as well as the end user desktop tools and utilities used for this purpose. The second section looks at some of the more important reports and describes the purpose of each of them.
5) Process
This document takes a process view of data movement within the enterprise datawarehouse (EDW) starting from extraction, staging, loading the EDW and subsequently the data mart. It explores the various related aspects like mode-of-extraction, extract strategy, control-structures, re-start and recovery. The document also looks at naming conventions followed for all objects within the datawarehouse environment.
The above set of documents cover the BI landscape with its focus on 3 critical themes – Architecture Track, Data Track and Process Track. Each of these tracks have a suggested reading sequence of above mentioned documents
Architecture Track – This theme focuses entirely on components, mechanisms and modes from an architectural angle. The suggested reading sequence for this track is – Architecture and Environment, Data Models, Reporting and OLAP
Data Track – This theme focuses on data – the methods of its sourcing, its movement across the datawarehouse, the methods of its storage and logistics of its delivery to the business users. The suggested reading sequence for this track is -Sources, Data Models, Reporting and OLAP.
Process Track – This theme focuses on datawarehouse from a process perspective and explores the different aspects related to it. The suggested reading sequence for this track is – Architecture and Environment, Process, Reporting and OLAP.
I have found it extremely useful to create such documentation for enterprise wide BI systems to ensure a level of control as functional complexity increases over a period of time.
Thanks for reading. Please do share your thoughts. Read More About  Business Intelligence

Friday, 18 July 2008

Data Integration Challenge – Error Handling

Determining the error and handling the errors encountered in the process of data transformation is one of the key design aspects in building a robust data integration platform. When an error occurs how do we capture the errors and use them for effective analysis. Following are the best practices related to error handling
  1. Differentiate the error handling process into the Generic (Null, Datatype, Data format) and the Specific like the rules related to the business process. This differentiation enables to build reusable error handling code
  2.  
  3. Do not stop validations when the record fails for one of the validations; continue with the other validations on the incoming data. If we have 5 validations to be done on a record, we need to design that the incoming record is taken through all the validations, this ensures that we capture all the errors in a record in one go
  4.  
  5. Have a table Error_Info; this has the repository of all the error messages. The fields would be ErrorCode, ErrorType and the ErrorMessage. The ErrorType would carry the values ‘warning’ or ‘error’, the ErrorMessage would have a detail description of the error and the ErrorCode a numeric value which is used in place of the description.
  6.  
  7. In general each validation should have an error message, we could also see the table Error_Info as a repository of all error validations performed in the system. In case of business rules that involve multiple fields, the field ErrorMessage in the table Error_Info can have the details of the business rule applied along with the field name, we can also create an additional field Error_Category to group the error messages
  8.  
  9. Have a table Error_Details; this stores the errors captured. The fields of this table would be KeyValue, FieldName, FieldValue and ErrorCode. The KeyValue would hold the value of the primary key of the record which has an error, the FieldName would store name of the field which has an error, the FieldValue has the value of the field which has failed or is an error, the ErrorCode details the error through a link to the table Error_Info.
  10.  
  11. Write each error captured as a separate record in the table Error_Deatils i.e., if a record fails for two conditions like a NULL check on field ‘ CustomerId’ and the data format check on the field ‘Date’ then ensure we write two records one for the NULL failure and one for the data format failure
  12.  
  13. To retain the whole incoming record have a table structure Source_Datasame as the incoming data. Have a field FLAG in the Source_Data, a value of ‘1’ would say that the record has passed all the validations and ‘0’ would say that it has failed one or more validations
  14.  
In summary the whole process would be to read the incoming record, validate the data, for any validation failure assign the error_code and pipe the errors captured to the Error_Details table, once all validations completed assign the FLAG value (1- Valid record, 0-Invalid record) and insert that record into the Source_data table. Having the data structure as suggested above would enable efficient analysis of the errors captured by the business and IT team.
Read More About  Data Integration Challenge