Efficient similarity search for hierarchical data in large databases

Kailing, K.; Kriegel, H.P.; Schonauer, S.; Seidl, T.
Kailing, K
Kriegel, H
Schonauer, S
Seidl, T
Proc. 9th Int. Conf. on Extending Database Technology (EDBT’04), 2004
Citations range: 
Kailing2004Efficientsimilaritysearchforhierarchicaldatainlarge.pdf1.03 MB

Structured and semi-structured object representations are getting more
and more important for modern database applications. Examples for such data are
hierarchical structures including chemical compounds, XML data or image data.
As a key feature, database systems have to support the search for similar objects
where it is important to take into account both the structure and the content features
of the objects. A successful approach is to use the edit distance for tree
structured data. As the computation of this measure is NP-complete, constrained
edit distances have been successfully applied to trees. While yielding good results,
they are still computationally complex and, therefore, of limited benefit for
searching in large databases. In this paper, we propose a filter and refinement
architecture to overcome this problem. We present a set of new filter methods
for structural and for content-based information in tree-structured data as well
as ways to flexibly combine different filter criteria. The efficiency of our methods,
resulting from the good selectivity of the filters is demonstrated in extensive
experiments with real-world applications.