Following are the design aspects towards getting a DI system dynamic
- Avoiding hard references, usage of parameter variables
- Usage of lookup tables for code conversion
- Setting and managing threshold value through tables
- Segregating data processing logics into common reusable components
- Ensuring that the required processes are controllable by the Business team with the required checks built in
We had defined the first two aspects in the earlier writing, let us look at the scenarios and approach for the other three items
Setting and managing threshold values through tables
In data validation process we also perform verification on the incoming data in terms of count or sum of a variable, in this case the validity of the count or sum derived is verified against a pre defined number usually called the ‘Threshold Value’. Some of the typical such validation are listed below
- The number of new accounts created should not be more than 10% (Threshold Value) of the total records
- The number of records received today and the number of records received yesterday can not vary by more than 250 records
- The sum of the credit amount should not be greater than the 100000
This threshold value differs across data sources but in many cases the metric to be derived would be similar across the data sources. We can get these ‘threshold values’ into a relational table and integrate this ‘threshold’ table into the Data Integration Challenge process as a lookup table, this enables the same threshold based data validation code to implemented across different data sources and also apply the specific data source threshold value.
Segregating Data Processing Logics into Common Reusable Components
Having many reusable components in a system by itself makes a DI system dynamic or adaptable, the reason being that reusable components work on the basic aspect of parameterization of inputs and outputs of an existing process and parameterization is a key component to get a DI system dynamic. Some of the key characteristics to look for in a DI system that would help carve out a reusable component are
- Multiple data sources providing data for a particular subject area like HR data coming from different HR systems
- Same set of data being shared with multiple downstream systems or a data hub system
- Existence of an industry standard format like SWIFT, HIPPA either as source or target
- Integration with third party systems or their data like D&B, FairIsaac
- Changing data layouts of the incoming data structure
- Systems that capture survey data
Ensuring that the required processes are controllable by the Business team with the required checks built in
In many situations we are now seeing requirements where in the business would be providing regular inputs to the IT team of the DI systems, these are the situations where we can design and place the portions of the DI system parameters under the business control. Typical examples of such scenarios are
- In ‘threshold value’ based data validation, these values would be business driven i.e., ‘threshold table’ can be managed by the business team and they would be able to make changes to the threshold table without code changes and without IT support
- In many scenarios the invalid data would under go multiple passes and be need to be validated at different passes by the business in terms of starting a BI session, the input from the business could be just starting the process or as well providing input data
- The data to be pulled out from a warehouse based on a feed from an online application; a typical web service problem-solution
The need for the business team to control or feed the DI systems is common with companies that handle more external data as with market research firms and Software As A Service (SAAS) firms. The web service support from the leading Data Integration vendors plays a major role in full filing these needs.
0 comments:
Post a Comment