Name Disambiguation Using Web Connection

Lu, Y; Nie, Z; Cheng, T; Gao, Y; Wen, JR
Lu, Y
Nie, Z
Cheng, T
Gao, Y
Wen, JR
Proceedings of AAAI 2007 Workshop on Information Integration ...
Citations range: 
1 - 9
Lu2007NameDisambiguationUsingWebConnection.pdf373.34 KB

to the same person, it is very likely that they share some Name disambiguation is an important challenge in data coauthors, references, or are indirectly related by a chain of cleaning. In this paper, we focus on the problem that multiple relationships. real-world objects (e.g., authors, actors) in a dataset share the same name. We show that Web corpora can be exploited to significantly improve the accuracy (i.e. precision and recall) of name disambiguation. We introduce a novel approach called WebNaD (Web-based Name Disambiguation) to effectively measure and use the Web connection between different object appearances of the same name in the local dataset. Our empirical study done in the context of Libra, an academic search engine that indexes 1 million papers, shows the effectiveness of our approach. Figure 1. Three "Lei Zhang" are found in DBLP.