Advanced methods for record linkage

Authors: 
Winkler, W.E.
Author: 
Winkler, W
Year: 
1994
Venue: 
Proceedings of the Section on, 1994
URL: 
http://citeseer.ist.psu.edu/254560.html
Citations: 
196
Citations range: 
100 - 499

Record linkage, or computer matching, is needed for the creation and maintenance of name and
address lists that support operations for and evaluations of a Year 2000 Census. This paper
describes three advances. The first is an enhanced method of string comparison for dealing with
typographical variations and scanning errors. It improves upon string comparators in computer
science. The second is a linear assignment algorithm that can use only 0.002 as much storage as
existing algorithms in operations research, requires at most an additional 0.03 increase in time, and
has less of a tendency to make erroneous matching assignments than existing sparse-array
algorithms because of how it deals with most arcs. The third is an expectation-maximization
algorithm for estimating parameters in latent class, loglinear models of the type arising in record
linkage. The associated theory and software are the only known means of dealing with general
interaction patterns and allow weak use of a priori information via a generalization to the MCECM
algorithm of Meng and Rubin. Models assuming that interactions are conditionally independent
given the class are typically considered in biostatistics and social science.