Click a term to initiate a search.
Research and industry has tackled the object identification
problem of data integration in many different ways.
This paper presents a framework, that allows the evaluation of
competing approaches. To this end, complexity measures and
data characteristics are introduced, which reflect the hardness
of a given object identification problem. All characteristics can be
estimated by use of simple SQL queries and simple calculations.
Following the principle of benchmark definitions we specify a test
framework. It consists of a test database and its characteristics,
quality criteria, and a test specification. Adequate measures
needed for the correctness criterion of the benchmark are given.
A running example of the Berlin Online Apartment-Advertisements
database (BOA) illustrates the approach. The BOA-database is
freely available at www.wiwiss.fu-berlin.de/lenz/boa/.