cs.uwaterloo.ca

Modeling and Querying Possible Repairs in Duplicate Detection

Authors: 
Beskales, George; Soliman, Mohamed; Ilyas, Ihab; Ben-David, Shai
Year: 
2009
Venue: 
VLDB

One of the most prominent data quality problems is the existence
of duplicate records. Current duplicate elimination procedures usually
produce one clean instance (repair) of the input data, by carefully
choosing the parameters of the duplicate detection algorithms.
Finding the right parameter settings can be hard, and in many cases,
perfect settings do not exist. Furthermore, replacing the input dirty
data with one possible clean instance may result in unrecoverable
errors, for example, identification and merging of possible duplicate
records in health care systems.

Syndicate content