Improving Grouped-Entity Resolution using Quasi-Cliques

Authors: 
On, BW; Elmacioglu, E; Lee, D; Kang, J; Pei, J
Author: 
On, B
Elmacioglu, E
Lee, D
Kang, J
Pei, J
Year: 
2006
Venue: 
ICDM
URL: 
http://www.cs.sfu.ca/~jpei/publications/grouped-entity-icdm06.pdf
Citations: 
0
Citations range: 
n/a
AttachmentSize
On2006ImprovingGroupedEntity.pdf1.11 MB

The entity resolution (ER) problem, which identifies duplicate
entities that refer to the same real world entity, is
essential in many applications. In this paper, in particular,
we focus on resolving entities that contain a group of
related elements in them (e.g., an author entity with a list
of citations, a singer entity with song list, or an intermediate
result by GROUP BY SQL query). Such entities, named
as grouped-entities, frequently occur in many applications.
The previous approaches toward grouped-entity resolution
often rely on textual similarity, and produce a large number
of false positives. As a complementing technique, in
this paper, we present our experience of applying a recently
proposed graph mining technique, Quasi-Clique, atop conventional
ER solutions. Our approach exploits contextual
information mined from the group of elements per entity in
addition to syntactic similarity. Extensive experiments verify
that our proposal improves precision and recall up to
83% when used together with a variety of existing ER solutions,
but never worsens them.