Mining Document Collections to Facilitate Accurate Approximate Entity Matching

Guided search

Click a term to initiate a search.

Keyword search

Mining Document Collections to Facilitate Accurate Approximate Entity Matching

Thu, 09/03/2009 - 10:59 — koepcke

Authors:

Chaudhuri, Surajit; Ganti, Venkatesh; Xin, Dong

Author:

Chaudhuri, S

Xin, D

Ganti, V

Year:

2009

Venue:

VLDB

Citations:

Citations range:

10 - 49

Attachment	Size
vldb09-315.pdf	412.18 KB

Many entity extraction techniques leverage large reference
entity tables to identify entities in documents. Often, an
entity is referenced in document collections differerently from
that in the reference entity tables. Therefore, we study the
problem of determining whether or not a substring "approx-
imately" matches with a reference entity. Similarity mea-
sures which exploit the correlation between candidate sub-
strings and reference entities across a large number of doc-
uments are known to be more robust than traditional stand
alone string-based similarity functions. However, such an
approach has significant efficiency challenges. In this paper,
we adopt a new architecture and propose new techniques
to address these e±ciency challenges. We mine document
collections and expand a given reference entity table with
variations of each of its entities. Thus, the problem of ap-
proximately matching an input string against reference en-
tities reduces to that of exact match against the expanded
reference table, which can be implemented efficiently. In
an extensive experimental evaluation, we demonstrate the
accuracy and scalability of our techniques.

microsoft.com

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Mining Document Collections to Facilitate Accurate Approximate Entity Matching

Related categories

User login