Pages

Ads 468x60px

Labels

Friday 1 June 2007

What is Data Integration or ETL ?


ETL represents the three basic steps:
  1. Extraction of data from a source system

  2. Transformation of the extracted data and

  3. Loading the transformed data into a target environment

In general ‘ETL’ represented more of batch process and that of gathering data from either flat files or relational structure. When ETL systems started supporting data from wider sources like XML, industry standard format like SWIFT, unstructured data, real time feeds like message queues etc ‘ETL’ got evolved to ‘Data Integration’. That’s the reason why now all ETL product vendors are called Data Integrators.
Now let us see how Data Integration or ETL has evolved over the period. The ways of performing DI…
  • Write Code
  • Generate Code
  • Configure Engine
Write Code: Write a piece of code in a programming language, compile and execute
Generate Code: Use a Graphical User Interface to input the requirements of data movement, generate the code in a programming language, compile and execute
Configure Engine: Use a Graphical User Interface to input the requirements, save the inputs (Metadata) in a data store (repository). Use the generic pre compiled Engine to interpret the metadata from the repository and execute.
Pros and Cons of each approach
ProsWrite CodeGenerate CodeConfigure Engine
  • Easy to get started for smaller tasks
  • Complex data handling requirements can be met
  • Developer friendly to design the requirements
  • Metadata of requirements captured
  • Developer friendly to design the requirements
  • Metadata of requirements captured
  • Easier code maintenance
  • Flexibility to access any type of data source
  • Scalable for huge data volume supports architectures like SMP, MPP, NUMA – Q,GRID etc
Cons
  • Large effort in maintenance of the code
  • Labor-intensive development, error prone and time consuming
  • Large effort in maintenance of the code
  • Metadata and code deployed can be out of sync
  • Certain data handling requirements might require adding a ‘hand written code’
  • Dedicated environment, servers and the initial configuration process

To add more variety to your thoughts on Data , you can read it More Data Integration

0 comments:

Post a Comment