Pages

Ads 468x60px

Labels

Showing posts with label Data Management. Show all posts
Showing posts with label Data Management. Show all posts

Tuesday 16 October 2012

Collaborative Data Management – Need of the hour!

Well the topic may seem like a pretty old concept, yet a vital one in the age of Big Data, Mobile BI and the Hadoops! As per FIMA 2012 benchmark report Data Quality (DQ) still remains as the topmost priority in data management strategy:

What gets measured improves!’ But often Data Quality (DQ) initiative is a reactive strategy as opposed to being a pro-active one; consider the impact bad data could have in a financial reporting scenario – brand tarnish, loss of investor confidence.

But are the business users aware of DQ issue? A research report by ‘The Data Warehousing Institute’, suggested that more that 80% of the business managers surveyed believed that the business data was fine, but just half of their technical counterparts agreed on the same!!! Having recognized this disparity, it would be a good idea to match the dimensions of data and the business problem created due to lack of data quality.

Data Quality Dimensions – IT Perspective

 

  • Data Accuracy – the degree to which data reflects the real world
  • Data Completeness – inclusion of all relevant attributes of data
  • Data Consistency –  uniformity of data  across the enterprise
  • Data Timeliness – Is the data up-to-date?
  • Data Audit ability – Is the data reliable?

 

Business Problems – Due to Lack of Data Quality

Department/End-Users

Business Challenges

Data Quality Dimension*

Human Resources

The actual employee performance as reviewed by the manager is not in sync with the HR database, Inaccurate employee classification based on government classification groups – minorities, differently abled

Data consistency, accuracy

Marketing

Print and mailing costs associated with sending duplicate copies of promotional messages to the same customer/prospect, or sending it to the wrong address/email

Data timeliness

Customer Service

Extra call support minutes due to incomplete data with regards to customer and poorly-defined metadata for knowledge base

Data completeness

Sales

Lost sales due to lack of proper customer purchase/contact information that paralysis the organization from performing behavioral analytics

Data consistency, timeliness

‘C’ Level

Reports that drive top management decision making are not in sync with the actual operational data, getting a 360o view of the enterprise

Data consistency

Cross Functional

Sales and financial reports are not in sync with each other – typically data silos

Data consistency, audit ability

Procurement

The procurement level of commodities are different from the requirement of production resulting in excess/insufficient inventory

Data consistency, accuracy

Sales Channel

There are different representations of the same product across ecommerce sites, kiosks, stores and the product names/codes in these channels are different from those in the warehouse system. This results in delays/wrong items being shipped to the customer

Data consistency, accuracy

*Just a perspective, there could be other dimensions causing these issues too

As it is evident, data is not just an IT issue but a business issue too and requires a ‘Collaborative Data Management’ approach (including business and IT) towards ensuring quality data. The solution is multifold starting from planning, execution and sustaining a data quality strategy. Aspects such as data profiling, MDM, data governance are vital guards that helps to analyze data, get first-hand information on its quality and to maintain its quality on an on-going basis.

Collaborative Data Management – Approach

Key steps in Collaborative Data Management would be to:

  • Define and measure metrics for data with business team
  • Assess existing data for the metrics – carry out a profiling exercise with IT team
  • Implement data quality measures as a joint team
  • Enforce a data quality fire wall (MDM) to ensure correct data enters the information ecosystem as a governance process
  • Institute Data Governance and Stewardship programs to make data quality a routine and stable practice at a strategic level

This approach would ensure that the data ecosystem within a company is distilled as it involves business and IT users from each department at all hierarchy.

Thanks for reading, would appreciate your thoughts.

 

Collaborative Data Management – Need of the hour!

Well the topic may seem like a pretty old concept, yet a vital one in the age of Big Data, Mobile BI and the Hadoops! As per FIMA 2012 benchmark report Data Quality (DQ) still remains as the topmost priority in data management strategy:

What gets measured improves!’ But often Data Quality (DQ) initiative is a reactive strategy as opposed to being a pro-active one; consider the impact bad data could have in a financial reporting scenario – brand tarnish, loss of investor confidence.

But are the business users aware of DQ issue? A research report by ‘The Data Warehousing Institute’, suggested that more that 80% of the business managers surveyed believed that the business data was fine, but just half of their technical counterparts agreed on the same!!! Having recognized this disparity, it would be a good idea to match the dimensions of data and the business problem created due to lack of data quality.

Data Quality Dimensions – IT Perspective

 

  • Data Accuracy – the degree to which data reflects the real world
  • Data Completeness – inclusion of all relevant attributes of data
  • Data Consistency –  uniformity of data  across the enterprise
  • Data Timeliness – Is the data up-to-date?
  • Data Audit ability – Is the data reliable?

 

Business Problems – Due to Lack of Data Quality

Department/End-Users

Business Challenges

Data Quality Dimension*

Human Resources

The actual employee performance as reviewed by the manager is not in sync with the HR database, Inaccurate employee classification based on government classification groups – minorities, differently abled

Data consistency, accuracy

Marketing

Print and mailing costs associated with sending duplicate copies of promotional messages to the same customer/prospect, or sending it to the wrong address/email

Data timeliness

Customer Service

Extra call support minutes due to incomplete data with regards to customer and poorly-defined metadata for knowledge base

Data completeness

Sales

Lost sales due to lack of proper customer purchase/contact information that paralysis the organization from performing behavioral analytics

Data consistency, timeliness

‘C’ Level

Reports that drive top management decision making are not in sync with the actual operational data, getting a 360o view of the enterprise

Data consistency

Cross Functional

Sales and financial reports are not in sync with each other – typically data silos

Data consistency, audit ability

Procurement

The procurement level of commodities are different from the requirement of production resulting in excess/insufficient inventory

Data consistency, accuracy

Sales Channel

There are different representations of the same product across ecommerce sites, kiosks, stores and the product names/codes in these channels are different from those in the warehouse system. This results in delays/wrong items being shipped to the customer

Data consistency, accuracy

*Just a perspective, there could be other dimensions causing these issues too

As it is evident, data is not just an IT issue but a business issue too and requires a ‘Collaborative Data Management’ approach (including business and IT) towards ensuring quality data. The solution is multifold starting from planning, execution and sustaining a data quality strategy. Aspects such as data profiling, MDM, data governance are vital guards that helps to analyze data, get first-hand information on its quality and to maintain its quality on an on-going basis.

Collaborative Data Management – Approach

Key steps in Collaborative Data Management would be to:

  • Define and measure metrics for data with business team
  • Assess existing data for the metrics – carry out a profiling exercise with IT team
  • Implement data quality measures as a joint team
  • Enforce a data quality fire wall (MDM) to ensure correct data enters the information ecosystem as a governance process
  • Institute Data Governance and Stewardship programs to make data quality a routine and stable practice at a strategic level

This approach would ensure that the data ecosystem within a company is distilled as it involves business and IT users from each department at all hierarchy.

Thanks for reading, would appreciate your thoughts.

 

Collaborative Data Management – Need of the hour!

Well the topic may seem like a pretty old concept, yet a vital one in the age of Big Data, Mobile BI and the Hadoops! As per FIMA 2012 benchmark report Data Quality (DQ) still remains as the topmost priority in data management strategy:

What gets measured improves!’ But often Data Quality (DQ) initiative is a reactive strategy as opposed to being a pro-active one; consider the impact bad data could have in a financial reporting scenario – brand tarnish, loss of investor confidence.

But are the business users aware of DQ issue? A research report by ‘The Data Warehousing Institute’, suggested that more that 80% of the business managers surveyed believed that the business data was fine, but just half of their technical counterparts agreed on the same!!! Having recognized this disparity, it would be a good idea to match the dimensions of data and the business problem created due to lack of data quality.

Data Quality Dimensions – IT Perspective

 

  • Data Accuracy – the degree to which data reflects the real world
  • Data Completeness – inclusion of all relevant attributes of data
  • Data Consistency –  uniformity of data  across the enterprise
  • Data Timeliness – Is the data up-to-date?
  • Data Audit ability – Is the data reliable?

 

Business Problems – Due to Lack of Data Quality

Department/End-Users

Business Challenges

Data Quality Dimension*

Human Resources

The actual employee performance as reviewed by the manager is not in sync with the HR database, Inaccurate employee classification based on government classification groups – minorities, differently abled

Data consistency, accuracy

Marketing

Print and mailing costs associated with sending duplicate copies of promotional messages to the same customer/prospect, or sending it to the wrong address/email

Data timeliness

Customer Service

Extra call support minutes due to incomplete data with regards to customer and poorly-defined metadata for knowledge base

Data completeness

Sales

Lost sales due to lack of proper customer purchase/contact information that paralysis the organization from performing behavioral analytics

Data consistency, timeliness

‘C’ Level

Reports that drive top management decision making are not in sync with the actual operational data, getting a 360o view of the enterprise

Data consistency

Cross Functional

Sales and financial reports are not in sync with each other – typically data silos

Data consistency, audit ability

Procurement

The procurement level of commodities are different from the requirement of production resulting in excess/insufficient inventory

Data consistency, accuracy

Sales Channel

There are different representations of the same product across ecommerce sites, kiosks, stores and the product names/codes in these channels are different from those in the warehouse system. This results in delays/wrong items being shipped to the customer

Data consistency, accuracy

*Just a perspective, there could be other dimensions causing these issues too

As it is evident, data is not just an IT issue but a business issue too and requires a ‘Collaborative Data Management’ approach (including business and IT) towards ensuring quality data. The solution is multifold starting from planning, execution and sustaining a data quality strategy. Aspects such as data profiling, MDM, data governance are vital guards that helps to analyze data, get first-hand information on its quality and to maintain its quality on an on-going basis.

Collaborative Data Management – Approach

Key steps in Collaborative Data Management would be to:

  • Define and measure metrics for data with business team
  • Assess existing data for the metrics – carry out a profiling exercise with IT team
  • Implement data quality measures as a joint team
  • Enforce a data quality fire wall (MDM) to ensure correct data enters the information ecosystem as a governance process
  • Institute Data Governance and Stewardship programs to make data quality a routine and stable practice at a strategic level

This approach would ensure that the data ecosystem within a company is distilled as it involves business and IT users from each department at all hierarchy.

Thanks for reading, would appreciate your thoughts.

 

Tuesday 18 March 2008

Data Integration Challenge – Initial Load – I


In a data warehouse all tables usually go through two phases of data load process they are the initial load and the incremental load. ‘History Load’ or ‘Initial Seeding/Load’ involves a one time load of the source transaction system data of the past years into the Data Management System. The process of adding only the new records (updations or insertions) to the data warehouse tables either daily or on a predefined frequency is called ‘Incremental Load‘. Also certain tables that are of small in size and largely independent set of tables which receives full data (current data + history data) as input would be loaded by means of a ‘Full Refresh‘; this involves complete delete and reload of data.

Especially code tables would usually under go a one time initial load and may not be required for a regular incremental load, incremental load is common for fact tables. Initial Load of a data warehouse system is quite a challenge in terms of getting it completed successfully within a planned timeframe. Some of the surprises or challenges faced in completing the history load are
  1. Handling invalid records
  2. Data Reconciliation
  3. System performance
  4. Catching up
Handling Invalid Records:
The occurrence of an invalid record becomes much more prominent as we process the history data which was collected into the source system much long before and the history data might not fit into the current business rules. The records from a source system can become invalid in the data warehouse due to multiple reasons like invalid domain value for a column or null value for a non null-able field or aggregate data not matching to the detail data. The ways of handling this problem effectively are
  • Determine the years of data to be loaded into the data warehouse very initially and ensure that the data profiling is performed on the sample data for all the years that has to be loaded. This ensures that most of the rules of data validation are identified up front and built as part of ETL process. In certain situations we may have to build separate data validation and transformation logic based on the year and data
  • Especially in situations like re-platforming or migrating the existing data warehouse to a new platform, even before running the data through regular ETL process we might need to load the old data into a data validation (staging) area through which the data analysis is done, cleaned and then data loaded into the data warehouse through regular ETL process
  • Design the ETL process to divert all the key values of the invalid records to a separate set of tables. In some sites we see that the customer just needs to be aware for the error records and fine if these records doesn’t get aligned into the current warehouse, but at times the invalid records are corrected and reloaded
  • For certain scenarios like aggregate data not matching to detail data, though we might always derive aggregate from detail data at times we might also generate detail data to match aggregate data
Data Reconciliation:
Once the initial load from the source system into the data warehouse has been completed we have to validate to ensure that the data has been moved in correctly.
  • Having a means of loading records in groups separated by years or any logical grouping like by customer or product would give a better control in terms of data validation. In general data validations performed are like count and sum should be tied to certain business specific validation rules like all customers from region ‘A’ belonging to division ‘1’ in the source should be classified under division ‘3’ in the current warehouse.
  • All the validations that needs to be performed after the initial load for each data group has to prepared and verified with the business team, many a times the data is validated by the business as a adhoc query process though the same can be verified by an automated ETL process by the data warehouse team
We shall discuss further on the other challenges in Part II.
Read More About: Data Integration