Efficient similarity-based operations for data integration

Guided search

Click a term to initiate a search.

Keyword search

Efficient similarity-based operations for data integration

Fri, 04/20/2007 - 13:58 — cat

Authors:

Schallehn, E; Sattler, KU; Saake, G

Author:

Schallehn, E

Sattler, K

Saake, G

Year:

2004

Venue:

Data & Knowledge Engineering

URL:

http://portal.acm.org/citation.cfm?id=985240

Citations:

Citations range:

10 - 49

Attachment	Size
Schallehn2004Efficientsimilaritybasedoperationsfordataintegration.pdf	11.7 KB

Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of this operator, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples of application from the context of a data reconciliation project for looted art.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Efficient similarity-based operations for data integration

Related categories

User login