Exploiting secondary sources for automatic object consolidation

Michalowski, M; Thakkar, S; Knoblock, CA
Proc. 2003 ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation
Citations range: 
10 - 49

Information sources on the web are controlled by different
organizations or people, utilize different text formats, and
have varying inconsistencies. Therefore, any system that integrates
information from different data sources must consolidate
data from these sources. Data from many data
sources on the web may not contain enough information to
accurately consolidate the data even using state of the art
object consolidation systems. We present an approach to
accurately and automatically consolidate data from various
data sources by utilizing a state of the art object consolidation
system in conjunction with a mediator system. The
mediator system is able to automatically determine which
secondary sources need to be queried in cases where the
object consolidation system is unable to confidently determine
whether two records refer to the same entity. In turn,
the object consolidation system is then able to utilize this
additional information to improve the accuracy of the consolidation
between datasets.