Business users must delve deeply into the significance of data relevant to their business requirements, but typically traditional ETL approaches to EDM are not the most effective, since they are used to move data between different data sources to populate
Data Warehouses or Data Marts and feed other data applications. When complex projects use an ETL approach, it is never possible attain the significance of the data; it is only possible to check data consistency in relation to the destination data model.Therefore,
ETL tools are more technically oriented, as opposed to being business oriented.Technically oriented tools are limited in the sense that they do not analyze the significance of data related to specific business requirements.Even in a simple data integration
or migration project, a business perspective toward data is needed, in fact it wouldstill include:data extractiondata transformationdata classificationsynchronization of different sourcesmulti-level and multistep data quality checks...
In either case complex code must be written into the ETL tool and technical and infrastructural problems must be managed, like temporary storage, security contentions, performance etc.It is clear that this type of approach requires enormous amounts of time
and resources, which increases the project's complexity, duration and manageability.In this scenario the main limitation of standard ETL tools are:When data comes from different sources and timings you need to define a temporary and intermediate data model
to store data that must be used over and over throughout the process. Example: data quality checks based on trend conditions or from different sources.Detailed data processing steps must be manually specified.The intermediate data structure must be manually
maintained, leading to complex and unmanageable amounts of diagrams.Data quality checks increasingly become complex following the ETL data streaming process.It is not possible to manage data quality checks from a business perspectiveManaging multilevel and
multistep quality checks is extremely complex, such as in data aggregation, data remapping and data restating for business reconciliation processes.Managing updates and changes to rules is cumbersome.Typically, you need to manage temporary storage areas, since
isolated changes to process steps will affect the rest of the chain.User interfaces, enrichment, data adjustments, and adding comments or supporting documents are not available.
To approach business scenarios that have extensive lists of needs like for example: Data Integration & Transformation, Data Governance, Data Quality, Data Migration, Data Reconciliation, Data Aggregation & Reporting, Metadata Management, etc. a more innovative
approach must be considered.This approach must let you develop EDM solutions for business users that simplifies all the necessary tasks for data management so users only focus on the business rules.
To reduce the gap created by traditional tools, you can try to approach the problem in a completely different way.Instead of having to define exactly how to gain your information you should only need to ask for it. Having a goal driven declarative approach
(declarative ELT) allows the system to do all the work for you. You simply define What you need and not How to get it: users do not need to define intermediate storage, data streams, parallelism, sorting and joining algorithms...With a Declarative ELT approach
- The user decides the data transformation result by defining rules and controls; the system engages the data extraction and transformation.
- The system can dynamically manages synchronization, temporary and parallel storage management and chooses the smartest optimized execution path.
User interfaces simplify data transformation management and allow information archiving without the need to write code. This approach guarantees performance, architectural simplicity, business focus, implementation speed and ease of maintenance due to:
- single execution optimization;
- automatic maximization of parallel executions;
- data movement minimization;
- same server optimization & bulk mode;
- leverage for underlying RDBMS capabilities (statistical query optimization, data management, parallelism, I/O optimization, ...);
- multi-phase execution (e.g. dynamic parameters detection and handling);
- Real time collection of complete statistics on execution performance, parallelism degree and data movement.