Improving data cleaning quality using a data lineage facility

Authors: 
Galhardas, H; Florescu, D; Shasha, D; Simon, E; E Simon, CA
Author: 
Galhardas, H
Florescu, D
Shasha, D
Simon, E
E Simon, C
Year: 
2001
Venue: 
Proc. Conf. on Data Management and Data Warehouses (DMDW)
URL: 
http://web.tagus.ist.utl.pt/~helena.galhardas/DMDW.pdf.gz
Citations: 
30
Citations range: 
10 - 49
AttachmentSize
Galhardas2001Improvingdatacleaningqualityusingadatalineagefacility.pdf105.49 KB

The problem of data cleaning, which consists of
removing inconsistencies and errors from original
data sets, is well known in the area of decision
support systems and data warehouses. However,
for some applications, existing ETL (Extraction
Transformation Loading) and data cleaning
tools for writing data cleaning programs are insuf-
ficient. One important challenge with them is the
design of a data flow graph that effectively generates
clean data. A generalized difficulty is the lack
of explanation of cleaning results and user interaction
facilities to tune a data cleaning program.
This paper presents a solution to handle this problem
by enabling users to express user interactions
declaratively and tune data cleaning programs.