Leveraging aggregate constraints for deduplication

Authors: 
Chaudhuri, S; Sarma, AD; Ganti, V; Kaushik, R
Author: 
Chaudhuri, S
Sarma, AD
Ganti, V
Kaushik, R
Year: 
2007
Venue: 
SIGMOD
URL: 
http://portal.acm.org/citation.cfm?id=1247480.1247530
Citations: 
0
Citations range: 
n/a
AttachmentSize
dedup07[1].pdf235.9 KB

We show that aggregate constraints (as opposed to pair-
wise constraints) that often arise when integrating multiple
sources of data, can be leveraged to enhance the quality of
deduplication. However, despite its appeal, we show that the
problem is challenging, both semantically and computation-
ally. We define a restricted search space for deduplication
that is intuitive in our context and we solve the problem
optimally for the restricted space. Our experiments on real
data show that incorporating aggregate constraints signifi-
cantly enhances the accuracy of deduplication.