Learning string-edit distance

Authors: 
Ristad, ES; Yianilos, PN; Inc, M.T.; Princeton, NJ
Author: 
Ristad, E
Yianilos, P
Inc, M
Princeton, N
Year: 
1998
Venue: 
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20, 1998
URL: 
http://citeseer.ist.psu.edu/86609.html
Citations: 
498
Citations range: 
100 - 499

In many applications, it is necessary to determine the similarity of two strings.
A widely-used notion of string similarity is the edit distance: the minimum
number of insertions, deletions, and substitutions required to transform one
string into the other. In this report, we provide a stochastic model for string
edit distance. Our stochastic model allows us to learn a string edit distance
function from a corpus of examples. We illustrate the utility of our approach
by applying it to the difficult problem of learning the pronunciation of words in
conversational speech. In this application, we learn a string edit distance with
one fourth the error rate of the untrained Levenshtein distance. Our approach
is applicable to any string classification problem that may be solved using a
similarity function against a database of labeled prototypes.