Iterative record linkage for cleaning and integration

Guided search

Click a term to initiate a search.

Keyword search

Iterative record linkage for cleaning and integration

Wed, 09/13/2006 - 15:10 — Anonymous

Authors:

Bhattacharya, I; Getoor, L

Author:

Bhattacharya, I

Getoor, L

Year:

2004

Venue:

Proc. 9th ACM workshop on Research in data mining and knowledge discovery

URL:

http://portal.acm.org/citation.cfm?id=1008694.1008697

Citations:

Citations range:

n/a

Attachment	Size
Bhattacharya2004Iterativerecordlinkagefor.pdf	137.82 KB

Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multiple sources. Traditional approaches use a similarity measure that compares tuples' attribute values; tuples with similarity scores above a certain threshold are declared to be matches. While this method can perform quite well in many domains, particularly domains where there is not a large amount of noise in the data, in some domains looking only at tuple values is not enough. By also examining the context of the tuple, i.e. the other tuples to which it is linked, we can come up with a more accurate linkage decision. But this additional accuracy comes at a price. In order to correctly find all duplicates, we may need to make multiple passes over the data; as linkages are discovered, they may in turn allow us to discover additional linkages. We present results that illustrate the power and feasibility of making use of join information when comparing records.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Iterative record linkage for cleaning and integration

Related categories

User login