Time-completeness trade-offs in record linkage using Adaptive Query Processing

Guided search

Click a term to initiate a search.

Keyword search

Time-completeness trade-offs in record linkage using Adaptive Query Processing

Fri, 02/27/2009 - 13:43 — cat

Authors:

Lengu, R; Missier, P; Fernandes, AAA; G Guerrini, M ..

Author:

Lengu, R

Missier, P

Fernandes, AAA

G Guerrini, M ..

Year:

2009

URL:

http://www.cs.man.ac.uk/~pmissier/docs/lmfmg-edbt.pdf

Citations:

Citations range:

1 - 9

Attachment	Size
Lengu2009TimecompletenesstradeoffsinrecordlinkageusingAdaptive.pdf	627.49 KB

Applications that involve data integration among multiple sources often require a preliminary step of data reconciliation in order to ensure that tuples match correctly across the sources. In dynamic settings such as data mashups, however, traditional offline data reconciliation techniques that require prior availability of the data may not be applicable. The alternative, performing similarity joins at query time, is computationally expensive, while ignoring the mismatch problem altogether leads to an incomplete integration. In this paper we make the assumption that, in some dynamic integration scenarios, users may agree to trade the completeness of a join result in return for a faster computation. We explore the consequences of this assumption by proposing a novel, hybrid join algorithm that involves a combination of exact and approximate join operators, managed using adaptive query processing techniques. The algorithm is optimistic: it can switch between physical join operators multiple times throughout query processing, but it only resorts to approximate join operators when there is statistical evidence that result completeness is compromised. Our experimental results show that sensible savings in join execution time can be achieved in practice, at the expense of a modest reduction in result completeness.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Time-completeness trade-offs in record linkage using Adaptive Query Processing

Related categories

User login