infotech.monash.edu

The Personal Name Problem And a Recommended Data Mining Solution

Authors: 
Phua, C; Lee, V; Smith, K
Year: 
2006
Venue: 
Encyclopedia of Data Warehousing and Mining (2nd Edition),

The personal name problem is the situation where the authenticity,
ordering, gender, and other information cannot be determined
correctly and automatically for every incoming personal name. A
novel solution, tested on scoring data, is to mine a comprehensive
external name dictionary with a set of chosen techniques made up
of exact matching, phonetics (extended soundex), simmetrics
(levenshtein), and classifiers (naïve Bayes algorithm). The main
contribution of this paper is in the evaluation of and selection
from five very different approaches and the empirical comparisons

Syndicate content