Object Matching for Information Integration: A Profiler-Based Approach

Doan, AnHai; Lu, Ying; Lee, Yoonkyong; Han, Jiawei
Doan, A
Lu, Y
Lee, Y
Han, J
Proceedings of the IJCAI-03 Workshop on Information Integration on the Web
Citations range: 
50 - 99

Object matching is a fundamental problem that arises in numerous
information integration scenarios. Virtually all existing
solutions to this problem have assumed that the objects
to be matched share the same set of attributes, and that
they can be matched by comparing the similarities of the attributes.
We consider the more general problem where the objects
can also have disjoint attributes, such as matching tuples
that come from relational tables with schemas (age,name)
and (name,salary), respectively.
We describe PROM, a solution that also exploits the disjoint
attributes to improve matching accuracy. In the above example,
PROM begins by matching any two given tuples based
on the shared attribute name. Then it applies a set of profilers,
each of which contains some knowledge about what
constitutes a typical person. The profilers examine the tuple
pair to see if it can plausibly make up a person. For example,
a profiler may state that because the age is 6 and the salary
is $100000, the tuples do not make up a person and thus do
not match. Profilers can be manually specified by domain
experts, learned from training data, transferred from other
matching tasks, or constructed from external data. Thus, the
PROM approach is distinguished in that it not only can exploit
disjoint attributes to improve matching accuracy, but can
also reuse knowledge from previous object matching tasks.