Merging the Results of Approximate Match Operations

Guided search

Click a term to initiate a search.

Keyword search

Merging the Results of Approximate Match Operations

Mon, 10/09/2006 - 12:48 — thor

Authors:

Guha, S.; Koudas, N.; Marathe, A.; Srivastava, D.

Author:

Guha, S

Koudas, N

Marathe, A

Srivastava, D

Year:

2004

Venue:

Proceedings of the 30th International Conference on Very Large Databases (VLDB 2004), 2004

URL:

http://www.isys.ucl.ac.be/vldb04/eProceedings/contents/pdf/RS17P2.PDF

Citations:

Citations range:

50 - 99

Attachment	Size
Guha2004MergingtheResultsof.pdf	209.55 KB

Data Cleaning is an important process that has been at
the center of research interest in recent years. An important
end goal of effective data cleaning is to identify
the relational tuple or tuples that are “most related” to
a given query tuple. Various techniques have been proposed
in the literature for efficiently identifying approximate
matches to a query string against a single attribute
of a relation. In addition to constructing a ranking (i.e.,
ordering) of these matches, the techniques often associate,
with each match, scores that quantify the extent
of the match. Since multiple attributes could exist in the
query tuple, issuing approximate match operations for
each of them separately will effectively create a number
of ranked lists of the relation tuples. Merging these lists
to identify a final ranking and scoring, and returning the
top-K tuples, is a challenging task.
In this paper, we adapt the well-known footrule distance
(for merging ranked lists) to effectively deal with
scores. We study efficient algorithms to merge rankings,
and produce the top-K tuples, in a declarative way.
Since techniques for approximately matching a query
string against a single attribute in a relation are typically
best deployed in a database, we introduce and describe
two novel algorithms for this problem and we
provide SQL specifications for them. Our experimental
case study, using real application data along with a
realization of our proposed techniques on a commercial
data base system, highlights the benefits of the proposed
algorithms and attests to the overall effectiveness and
practicality of our approach.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Merging the Results of Approximate Match Operations

Related categories

User login