Data Transformation for Warehousing Web Data

Authors: 
Zhu, Yan; Bornhovd, Christof; Buchmann, Alejandro P.
Author: 
Zhu, Y
Bornhovd, C
Buchmann, A
Year: 
2001
Venue: 
Third International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS '01), 2001
URL: 
http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=885223
Citations: 
14
Citations range: 
10 - 49

In order to analyze market trends and make reasonable business plans, a company's local data is not sufficient. Decision making must also be based on information from suppliers, partners and competitors. This external data can be obtained from the Web in many cases, but must be integrated with the company's own data, for example, in a data warehouse. To this end, Web data has to be mapped to the star schema of the warehouse. In this paper we propose a semi-automatic approach to support this transformation process. Our approach is based on the use a rooted labeled tree representation of Web data and the existing warehouse schema. Based on this common view we can compare source and target schemata to identify correspondences. We show how the correspondences guide the transformation to be accomplished automatically. We also explain the meaning of recursion and restructuring in mapping rules, which are the core of the transformation algorithm.