Record Matching over Query Results from Multiple Web Databases

Su, W; Wang, J; Lochovsky, F.H.
IEEE Transactions on Knowledge and Data Engineering

Record matching, which identifies the records that represent the same real-world entity, is an important step for data
integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These
methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated onthe-
fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on

Data fusion

Bleiholder, J; Naumann, F
ACM Computing Surveys

The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.

Learning object identification rules for information integration

Tejada, S; Knoblock, CA; Minton, S
Information Systems

When integrating information from multiple websites, the same data objects can exist in inconsistent text formats
across sites, making it difficult to identify matching objects using exact text match. We have developed an object
identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching
objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous
methods of object identification have required manual construction of object identification rules or mapping rules for

Syndicate content