Exploiting context analysis for combining multiple entity resolution systems

Authors: 
Chen, Zhaoqi; Kalashnikov, Dmitri V.; Mehrotra, Sharad
Author: 
Chen, Z
Mehrotra, S
Kalashnikov, D
Year: 
2009
Venue: 
SIGMOD
Citations: 
25
Citations range: 
10 - 49
AttachmentSize
Exploiting context analysis for combining multiple entity resolution systems.pdf457.35 KB

Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.