Determining the error and handling the errors encountered in the process of
data transformation is one of the key design aspects in building a robust data integration platform. When an error occurs how do we capture the errors and use them for effective analysis. Following are the best practices related to error handling
- Differentiate the error handling process into the Generic (Null, Datatype, Data format) and the Specific like the rules related to the business process. This differentiation enables to build reusable error handling code
-
- Do not stop validations when the record fails for one of the validations; continue with the other validations on the incoming data. If we have 5 validations to be done on a record, we need to design that the incoming record is taken through all the validations, this ensures that we capture all the errors in a record in one go
-
- Have a table Error_Info; this has the repository of all the error messages. The fields would be ErrorCode, ErrorType and the ErrorMessage. The ErrorType would carry the values ‘warning’ or ‘error’, the ErrorMessage would have a detail description of the error and the ErrorCode a numeric value which is used in place of the description.
-
- In general each validation should have an error message, we could also see the table Error_Info as a repository of all error validations performed in the system. In case of business rules that involve multiple fields, the field ErrorMessage in the table Error_Info can have the details of the business rule applied along with the field name, we can also create an additional field Error_Category to group the error messages
-
- Have a table Error_Details; this stores the errors captured. The fields of this table would be KeyValue, FieldName, FieldValue and ErrorCode. The KeyValue would hold the value of the primary key of the record which has an error, the FieldName would store name of the field which has an error, the FieldValue has the value of the field which has failed or is an error, the ErrorCode details the error through a link to the table Error_Info.
-
- Write each error captured as a separate record in the table Error_Deatils i.e., if a record fails for two conditions like a NULL check on field ‘ CustomerId’ and the data format check on the field ‘Date’ then ensure we write two records one for the NULL failure and one for the data format failure
-
- To retain the whole incoming record have a table structure Source_Datasame as the incoming data. Have a field FLAG in the Source_Data, a value of ‘1’ would say that the record has passed all the validations and ‘0’ would say that it has failed one or more validations
-
In summary the whole process would be to read the incoming record, validate the data, for any validation failure assign the error_code and pipe the errors captured to the Error_Details table, once all validations completed assign the FLAG value (1- Valid record, 0-Invalid record) and insert that record into the Source_data table. Having the data structure as suggested above would enable efficient analysis of the errors captured by the business and IT team.