Matching Algorithms within a Duplicate Detection System

Authors: 
Monge, AE
Author: 
Monge, A
Year: 
2000
Venue: 
IEEE Data Engineering Bulletin
URL: 
http://www.acm.org/sigs/sigmod/disc/disc01/out/websites/deb_december/monge.pdf
Citations: 
91
Citations range: 
50 - 99
AttachmentSize
Monge2000MatchingAlgorithmswithina.pdf38.74 KB

Detecting database records that are approximate duplicates, but not exact duplicates, is an important
task. Databases may contain duplicate records concerning the same real-world entity because of data
entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple
databases – such as what happens in data warehousing where records from multiple data sources are
integrated into a single source of information – among other reasons. In this paper we review a system
to detect approximate duplicate records in a database and provide properties that a pair-wise record
matching algorithm must have in order to have a successful duplicate detection system.