Two algorithms for approximate string matching in static texts

Jokinen, P.; Ukkonen, E.
Jokinen, P
Ukkonen, E
Proceedings Mathematical Foundations of Computer Science 1991,
Citations range: 

Algorithms with optimal expected running time are presented for searching the occurrences of a two-dimensional m X m pattern P in a two-dimensional n X n text T over an alphabet of size c. The algorithms are based on placing in the text a static grid of test points, determined only by n, m, and c (not dynamically by earlier test results). Using test strings read from the test points the algorithms eliminate as many potential occurrences of P as possible. The remaining potential occurrences are separately checked for actual occurrences. A suitable choice of the test point set leads to algorithms with expected running time O(n2logc m2/m2) using the uniform Bernoulli model of randomness. This is shown to be optimal by a generalization of a one-dimensional lower bound result by Yao. Experimental results show that the algorithms are efficient in practice, too. The method is also generalized for the k mismatches problem. The resulting algorithm has expected running time O(kn2logc m2/m2), provided that $k\leq(m\lfloor m/\lceil\log_c m^2\rceil\rfloor-1)/2$.\ All algorithms need preprocessing of P which takes time and space O(m2). The text processing can be done on-line, using a rather small window. The algorithms easily generalize to d-dimensional matching for any d.