On evaluation and training-set construction for duplicate detection

Guided search

Click a term to initiate a search.

Keyword search

On evaluation and training-set construction for duplicate detection

Wed, 10/11/2006 - 10:37 — cat

Authors:

Bilenko, M; Mooney, RJ

Author:

Bilenko, M

Mooney, R

Year:

2003

Venue:

Proceedings of the KDD-2003 workshop on data cleaning

URL:

http://www.cs.utexas.edu/~ml/papers/marlin-kdd-wkshp-03.pdf

Citations:

Citations range:

n/a

Attachment	Size
Bilenko2003Onevaluationandtrainingset.pdf	117.11 KB

A variety of experimental methodologies have been used to evaluate
the accuracy of duplicate-detection systems. We advocate presenting
precision-recall curves as the most informative evaluation
methodology. We also discuss a number of issues that arise when
evaluating and assembling training data for adaptive systems that
use machine learning to tune themselves to specific applications.
We consider several different application scenarios and experimentally
examine the effectiveness of alternative methods of collecting
training data under each scenario. We propose two new approaches
to collecting training data called static-active learning and weaklylabeled
non-duplicates, and present experimental results on their
effectiveness.

cs.utexas.edu

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

On evaluation and training-set construction for duplicate detection

Related categories

User login