Click a term to initiate a search.
In many applications, it is necessary to determine the similarity of two strings.
A widely-used notion of string similarity is the edit distance: the minimum
number of insertions, deletions, and substitutions required to transform one
string into the other. In this report, we provide a stochastic model for string
edit distance. Our stochastic model allows us to learn a string edit distance
function from a corpus of examples. We illustrate the utility of our approach
by applying it to the difficult problem of learning the pronunciation of words in
conversational speech. In this application, we learn a string edit distance with
one fourth the error rate of the untrained Levenshtein distance. Our approach
is applicable to any string classification problem that may be solved using a
similarity function against a database of labeled prototypes.