On active learning of record matching packages

Guided search

Click a term to initiate a search.

Keyword search

On active learning of record matching packages

Mon, 04/11/2011 - 08:35 — cat

Authors:

Arasu, A; Götz, M; Kaushik, R.

Author:

Arasu, A

Götz, M

Kaushik, R

Year:

2010

Venue:

Proc. ACM SIGMOD Conf.

URL:

http://portal.acm.org/citation.cfm?id=1807252

Citations:

Citations range:

10 - 49

Attachment	Size
Arasu2010Onactivelearningofrecordmatchingpackages.pdf	21.95 KB

We consider the problem of learning a record matching package (classifier) in an active learning setting. In active learning, the learning algorithm picks the set of examples to be labeled, unlike more traditional passive learning setting where a user selects the labeled examples. Active learning is important for record matching since manually identifying a suitable set of labeled examples is difficult. Previous algorithms that use active learning for record matching have serious limitations: The packages that they learn lack quality guarantees and the algorithms do not scale to large input sizes. We present new algorithms for this problem that overcome these limitations. Our algorithms are fundamentally different from traditional active learning approaches, and are designed ground up to exploit problem characteristics specific to record matching. We include a detailed experimental evaluation on realworld data demonstrating the effectiveness of our algorithms.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

On active learning of record matching packages

Related categories

User login