Managing source schema evolution in Web warehouses

Adriana MarottaRegina MotzRaúl Ruggia

Web Data Warehouses have been introduced to enable the analysis of integrated Web data. One of the main challenges in these systems is to deal with the volatile and dynamic nature of Web sources. In this work we address the effects of adding/removing/changing Web sources and data items to the Data Warehouse (DW) schema. By managing source evolution we mean the automatic propagation of these changes to the DW. The proposed approach is based on a wrapper/mediator architecture, which minimizes the impact of Web source changes on the DW schema. This paper presents this architecture and analyses some novel evolution cases in the context of Web DW.

