Combining schema and instance information for integrating heterogeneous data sources

Zhao, H; Ram, S
Data and Knowledge Engineering

Determining the correspondences among heterogeneous data sources, which is critical to integration of the data
sources, is a complex and resource-consuming task that demands automated support. We propose an iterative procedure
for detecting both schema-level and instance-level correspondences from heterogeneous data sources. Cluster analysis techniques
are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level

Entity identification for heterogeneous database integration: a multiple classifier system approach and empirical evaluation

Zhao, Huimin; Ram, Sudha
Information Systems

Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity.

Syndicate content