Pages

Ads 468x60px

Labels

Monday 19 November 2012

XML Optimization through custom Properties

1. Problem Statement:

I am creating a XML file as an output . If my source is empty, is there a way to  avoid the creation of an empty XML file?

Sample output Data with source data :


 

Case 1 : Empty Source – Creation of Minimal XML file

We have to set the following properties of an XML Target at session level under the Mapping tab.

Null Content Representation – “No Tag”

Empty String Content Representation – “No Tag”

Null Attribute Representation – “No Attribute”

Empty String Attribute Representation – “No attribute”

The Output file is as follows

Note: It generates the minimal XML and parent tag. The parent tags are shown as Unary Tag in the browser.

Case 2:  Creation of Zero Byte XML file.

Even though setting all the above property you will get an empty XML file with no data or only with parent tags. If downstream system Like MFT (Managed File Transfer) consumes this garbage file, you will end up with errors while processing.  To avoid these kinds of errors we have to set two custom properties in the Integration Service:

WriteNullXMLFile = No

The WriteNullXMLFile custom property skips creating an XML file when the XML Generator transformation or Target doesn’t receive data . The Default value for this parameter is Yes and. if you set No , the minimal XML document will not be generated and the target XML file size will be of zero byte.

 

2) Suppress the Empty Parent Tag

 

A PowerCenter session with an XML target writes empty parent tags to the XML file when all child elements are null.  This may occur even when the Null Content Representation option is set to No Tag in the session properties.

SuppressNilContentMethod = ByTree

The SuppressNilContentMethod server parameter will suppress the parent tags as well as the child tags when all the child elements are null. To achieve this, set the custom property to “ByTree”.

 

 

ByTree

The ByTree flag suppresses non-leaf elements up to (but not including) the document root, when the entire element chain originating at the specified element contains no data. ByTree flag is always optimal.

For example the Street1 and Street2 values are empty, without setting the property you will get the below output with Street Unary tag:

If you set the Property SuppressNilContentMethod = ByTree the entire Street tag will be vanished.

3) To reduce the Session log size while using XML as Target

XMLWarnDupRows =No

By default; it is Yes, the Informatica Server writes duplicate row warnings and duplicate rows for

XML targets to the session log.

4 ) To reduce the cache file size created by XML target and increase the performance of reading large XML files.

XMLSendChildFirst=Yes

How to set the Custom Properties?

Infa 8.x and Above

1. Connect to the Administration Console

2. Stop the Integration Service

3. Select the Integration Service

4. Under the Properties tab, click Edit in the Custom Properties section

5. Under Name enter WriteNullXMLFile = No

6. Under Value enter No

7. Under Name enter SuppressNilContentMethod

8. Under Value enter ByTree

9. Click OK

10. Restart the Integration Service

Starting with PowerCenter 8.5, this change could be done at the session task itself as follows:

These custom properties would override the DI service level properties.

1. Edit the session

2. Select Config Object tab

3. Under Custom Properties add the attribute WriteNullXMLFile=No and SuppressNilContentMethod=ByTree

4. Save the session

Session Properties:

Advanced Replication Setup for High availability and Performance

In my personal opinion, Oracle leads the market in Directory Product offerings (LDAP Directories). Starting from Oracle Internet Directory (OID), to the latest Oracle Unified Directory (OUD), Oracle definitely provides variety of LDAP Directory related products for integration.

With increasing demand for mobile computing and cloud computing offering, there is a need to standardize LDAP Deployments for Identification, Authentication and (sometimes) Authorization (IAA) services. With a highly scalable, highly performing, highly available, highly stable and highly secure LDAP Directory, these IAA services will be easier to integrate with applications in the cloud or for the mobile applications.

Introduction

Oracle Unified Directory (OUD) is a latest LDAP Directory offering from Oracle Corp. As mentioned in my previous post, OUD comes with three main components. They are:

  • Directory Server
  • Proxy Server
  • Replication Server

Here, Directory Server provides the main LDAP functionality (I assume you already know what an LDAP Directory Server means). Proxy server is used for to proxy LDAP requests (how?). AndReplication Server is used for replicating (copying) data from one OUD to another OUD or even to ODSEE server (we will talk more about replication in this post). You can read about my first post on OUD here. In this current article, I will write about replication server and advanced replication setup for Oracle Unified Directory.

Many people want a step by step guide (kind of cheat sheet) to setup something like OUD or OID for replication. Unfortunately I am not going to give you that here. In my personal opinion, that (cheat sheet) is not a right approach at all and will not be helpful in the long run for gaining concepts or knowledge. First of all, we need to give importance to the basic concepts behind how something works.

First of all, read OUD Documentation

Product Documentation must be read before you plan your deployment. You can find the OUD Documentation here. This link is for OUD Version 11.1.1. Make sure to refer the latest product manual. Documentation provides lot of details about the product and save lot of time with investigation later. For Replication, you need to start with “Architecture Reference” Guide.

When do you want to setup replication?

There should be a reason, right? If there is no reason, then there is no need for you to setup replication at all. Instead, you can have a beer and pass the time happily doing something else.

Ideally, you need replication setup for “High Availability” and “Performance”. Usually, there will be multiple instances of OUD Directory Server processes running in Production. Let’s say we need to have around four OUD Directory Servers (and four more for Business Continuity/Disaster Recovery).

Unfortunately, there is no single process to update all the eight OUD Directory Servers in our example. We need to find a mechanism to synchronize the directory entries across these servers.  For this, we need to use the OUD Replication Server Component.

Securing the Replication Traffic

We don’t want network sniffers taking away critical user information (even inside the internal network, it is possible). We need to encrypt the traffic between the replication servers. Do not consider setting up a Replication Server communication without encrypted traffic.

Since OUD provided identity data, all the network traffic is prone to sniffing attacks. Always use encrypted or secure connections to OUD or to any LDAP Directory.

Deciding a Replication Method to use

Next important thing is to decide what replication method you are going to use. This is mostly site specific and you need to know lot of details before deciding a replication method to use. I am planning to use the following sample architecture for this post. Let’s understand our sample OUD Architecture first.

 

Here are the quick components of the architecture:

  • We have one master OUD Server called PROD-01. All the updates to the directory happens here. Most probably, HR System will update the directory. Also, Updates can happen using a custom developed application plug-in for LDAP Directory or using a Identity and Access Management System (IAM) system such as Oracle Identity Manager or Tivoli Identity Manager.
  • PROD-02 will be used with PROD-01 for High Availability and Performance in this Production Deployment.
  • In Disaster Recovery deployment, we have PROD-03 and PROD-04 servers. These servers need to synchronize the user data from the master server PROD-01.

One way to setup replication is by provisioning users into all the six OUD Directory Servers by an Identity and Access Management (IAM) System (such as Oracle Identity Manager or Tivoli Identity Manager). However this provisioning can be time consuming to complete because it will be treated as updating six different LDAP Directories. So a better way to achieve this is using a Replication Server.

We will continue setting up the Replication Server for this architecture. Lets meet in another post - Until then.

Transitioning to a New World – An Analytical Perspective

Recently, I had the opportunity to speak at the Silicon India Business Intelligence Conference. The topic I chose for the discussion was focused on providing the BI & Analytics perspective for companies transitioning to a new world. You can view my presentation at this link –http://bit.ly/VLDDfF

The gist of my presentation is given below:

1)      First, established the fact that the world indeed is changing by showing some statistics:

  • Data Deluge: Amount of digital data created in the world right now stands at 7 Zettabytes per annum (1 Zettabyte = 1 Trillion Terabytes)
  • Social Media: Facebook has touched 1 Billion users which makes it the 3rd largest country in the world
  • Cloud: Tremendous amount of cloud infrastructure is being created
  • Mobility: There are 4.7 billion mobile subscribers which covers 65% of world population

2)      Enterprises face a very different marketplace due to the profound changes taking place in the way people buy, sell, interact with one another, spend their leisure time etc.

3)      To ensure that BI can help business navigate the new normal, there are 3 key focus areas.

  • Remove Bottlenecks – Give business what they want
  • Enhance Intelligence
  • End to End Visibility by strengthening the fundamentals

For each of the 3 areas mentioned above, I gave some specific examples of the trends in the BI space.

1)      For Removing Bottlenecks, the impact of in-memory and columnar databases were elaborated.

2)      For enhancing intelligence, working with unstructured data and using big data techniques were discussed.

3)      For the 3rd point, the focus was on strengthening the fundamentals in the BI landscape.

Please do check out my complete presentation at http://bit.ly/VLDDfF and let me know your views.

Thanks for reading.

Tuesday 16 October 2012

Collaborative Data Management – Need of the hour!

Well the topic may seem like a pretty old concept, yet a vital one in the age of Big Data, Mobile BI and the Hadoops! As per FIMA 2012 benchmark report Data Quality (DQ) still remains as the topmost priority in data management strategy:

What gets measured improves!’ But often Data Quality (DQ) initiative is a reactive strategy as opposed to being a pro-active one; consider the impact bad data could have in a financial reporting scenario – brand tarnish, loss of investor confidence.

But are the business users aware of DQ issue? A research report by ‘The Data Warehousing Institute’, suggested that more that 80% of the business managers surveyed believed that the business data was fine, but just half of their technical counterparts agreed on the same!!! Having recognized this disparity, it would be a good idea to match the dimensions of data and the business problem created due to lack of data quality.

Data Quality Dimensions – IT Perspective

 

  • Data Accuracy – the degree to which data reflects the real world
  • Data Completeness – inclusion of all relevant attributes of data
  • Data Consistency –  uniformity of data  across the enterprise
  • Data Timeliness – Is the data up-to-date?
  • Data Audit ability – Is the data reliable?

 

Business Problems – Due to Lack of Data Quality

Department/End-Users

Business Challenges

Data Quality Dimension*

Human Resources

The actual employee performance as reviewed by the manager is not in sync with the HR database, Inaccurate employee classification based on government classification groups – minorities, differently abled

Data consistency, accuracy

Marketing

Print and mailing costs associated with sending duplicate copies of promotional messages to the same customer/prospect, or sending it to the wrong address/email

Data timeliness

Customer Service

Extra call support minutes due to incomplete data with regards to customer and poorly-defined metadata for knowledge base

Data completeness

Sales

Lost sales due to lack of proper customer purchase/contact information that paralysis the organization from performing behavioral analytics

Data consistency, timeliness

‘C’ Level

Reports that drive top management decision making are not in sync with the actual operational data, getting a 360o view of the enterprise

Data consistency

Cross Functional

Sales and financial reports are not in sync with each other – typically data silos

Data consistency, audit ability

Procurement

The procurement level of commodities are different from the requirement of production resulting in excess/insufficient inventory

Data consistency, accuracy

Sales Channel

There are different representations of the same product across ecommerce sites, kiosks, stores and the product names/codes in these channels are different from those in the warehouse system. This results in delays/wrong items being shipped to the customer

Data consistency, accuracy

*Just a perspective, there could be other dimensions causing these issues too

As it is evident, data is not just an IT issue but a business issue too and requires a ‘Collaborative Data Management’ approach (including business and IT) towards ensuring quality data. The solution is multifold starting from planning, execution and sustaining a data quality strategy. Aspects such as data profiling, MDM, data governance are vital guards that helps to analyze data, get first-hand information on its quality and to maintain its quality on an on-going basis.

Collaborative Data Management – Approach

Key steps in Collaborative Data Management would be to:

  • Define and measure metrics for data with business team
  • Assess existing data for the metrics – carry out a profiling exercise with IT team
  • Implement data quality measures as a joint team
  • Enforce a data quality fire wall (MDM) to ensure correct data enters the information ecosystem as a governance process
  • Institute Data Governance and Stewardship programs to make data quality a routine and stable practice at a strategic level

This approach would ensure that the data ecosystem within a company is distilled as it involves business and IT users from each department at all hierarchy.

Thanks for reading, would appreciate your thoughts.

 

Collaborative Data Management – Need of the hour!

Well the topic may seem like a pretty old concept, yet a vital one in the age of Big Data, Mobile BI and the Hadoops! As per FIMA 2012 benchmark report Data Quality (DQ) still remains as the topmost priority in data management strategy:

What gets measured improves!’ But often Data Quality (DQ) initiative is a reactive strategy as opposed to being a pro-active one; consider the impact bad data could have in a financial reporting scenario – brand tarnish, loss of investor confidence.

But are the business users aware of DQ issue? A research report by ‘The Data Warehousing Institute’, suggested that more that 80% of the business managers surveyed believed that the business data was fine, but just half of their technical counterparts agreed on the same!!! Having recognized this disparity, it would be a good idea to match the dimensions of data and the business problem created due to lack of data quality.

Data Quality Dimensions – IT Perspective

 

  • Data Accuracy – the degree to which data reflects the real world
  • Data Completeness – inclusion of all relevant attributes of data
  • Data Consistency –  uniformity of data  across the enterprise
  • Data Timeliness – Is the data up-to-date?
  • Data Audit ability – Is the data reliable?

 

Business Problems – Due to Lack of Data Quality

Department/End-Users

Business Challenges

Data Quality Dimension*

Human Resources

The actual employee performance as reviewed by the manager is not in sync with the HR database, Inaccurate employee classification based on government classification groups – minorities, differently abled

Data consistency, accuracy

Marketing

Print and mailing costs associated with sending duplicate copies of promotional messages to the same customer/prospect, or sending it to the wrong address/email

Data timeliness

Customer Service

Extra call support minutes due to incomplete data with regards to customer and poorly-defined metadata for knowledge base

Data completeness

Sales

Lost sales due to lack of proper customer purchase/contact information that paralysis the organization from performing behavioral analytics

Data consistency, timeliness

‘C’ Level

Reports that drive top management decision making are not in sync with the actual operational data, getting a 360o view of the enterprise

Data consistency

Cross Functional

Sales and financial reports are not in sync with each other – typically data silos

Data consistency, audit ability

Procurement

The procurement level of commodities are different from the requirement of production resulting in excess/insufficient inventory

Data consistency, accuracy

Sales Channel

There are different representations of the same product across ecommerce sites, kiosks, stores and the product names/codes in these channels are different from those in the warehouse system. This results in delays/wrong items being shipped to the customer

Data consistency, accuracy

*Just a perspective, there could be other dimensions causing these issues too

As it is evident, data is not just an IT issue but a business issue too and requires a ‘Collaborative Data Management’ approach (including business and IT) towards ensuring quality data. The solution is multifold starting from planning, execution and sustaining a data quality strategy. Aspects such as data profiling, MDM, data governance are vital guards that helps to analyze data, get first-hand information on its quality and to maintain its quality on an on-going basis.

Collaborative Data Management – Approach

Key steps in Collaborative Data Management would be to:

  • Define and measure metrics for data with business team
  • Assess existing data for the metrics – carry out a profiling exercise with IT team
  • Implement data quality measures as a joint team
  • Enforce a data quality fire wall (MDM) to ensure correct data enters the information ecosystem as a governance process
  • Institute Data Governance and Stewardship programs to make data quality a routine and stable practice at a strategic level

This approach would ensure that the data ecosystem within a company is distilled as it involves business and IT users from each department at all hierarchy.

Thanks for reading, would appreciate your thoughts.