Extending Relational Algebra to express one-to-many data transformations

**
Paulo Carreira,
Helena Galhardas,
Antónia Lopes,
João Pereira.
**

Application scenarios such as legacy-data migration, ETL processes, data cleaning and data-integration require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these solutions is easily maintainable or optimizable. To take advantage of the optimization capabilities of RDBMSs, data transformations are often expressed as relational queries. However, the limited expressive power of relational query languages like SQL hinder this approach. In particular, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed as a relational query. In this paper, we present the formal definition of a new operator named data mapper operator as an extension to the relational algebra to address this important class of data transformations. We demonstrate that relational algebra extended with the mapper operator is more expressive than standard relational algebra. Furthermore, we investigate several properties of the operator and supply a set of algebraic rewriting rules that enable the logical optimization of expressions that combine standard relational operators with mappers and present their proofs of correctness.

http://www.lbd.dcc.ufmg.br/colecoes/sbbd/2005/010.pdf

Biblioteca Digital Brasileira de Computação - Contato: bdbcomp@lbd.dcc.ufmg.br