The Personal Name Problem And a Recommended Data Mining Solution

Phua, C; Lee, V; Smith, K
Phua, C
Lee, V
Smith, K
Encyclopedia of Data Warehousing and Mining (2nd Edition),
Citations range: 
10 - 49
Phua2006ThePersonalNameProblemAnda.pdf151.47 KB

The personal name problem is the situation where the authenticity,
ordering, gender, and other information cannot be determined
correctly and automatically for every incoming personal name. A
novel solution, tested on scoring data, is to mine a comprehensive
external name dictionary with a set of chosen techniques made up
of exact matching, phonetics (extended soundex), simmetrics
(levenshtein), and classifiers (naïve Bayes algorithm). The main
contribution of this paper is in the evaluation of and selection
from five very different approaches and the empirical comparisons
of multiple phonetical and string similarity techniques for the
personal name problem. Other contributions include relating
personal names mining to credit application fraud detection and
other security systems, and making the labelled data and
techniques available for future studies. In reality, there is no silver
bullet solution to this problem but it can be alleviated with
appropriate techniques on sufficient name data.