Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies

Guided search

Click a term to initiate a search.

Keyword search

Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies

Mon, 04/09/2007 - 12:52 — fnaumann

Authors:

Naumann, Felix; Bilke, Alexander; Bleiholder, Jens; Weis, Melanie

Author:

Naumann, F

Bilke, A

Bleiholder, J

Weis, M

Year:

2006

Venue:

IEEE Data Engineering Bulletin 29(2):21-31

URL:

http://www.hpi.uni-potsdam.de/fileadmin/hpi/FG_Naumann/publications/DEBull06.pdf

Citations:

Citations range:

n/a

Attachment	Size
Naumann2006DataFusioninThreeSteps.pdf	323.47 KB

Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it rep-
resents identical real-world objects multiple times, causing duplicates, and it has missing values and
conflicting values. Without suitable techniques for integrating and fusing such data, the data quality of
an integrated system remains low. We present a suite of methods, combined in a single tool, that allows
ad-hoc, declarative fusion of such data by employing schema matching, duplicate detection and data
fusion.
Guided by a SQL-like query against one or more tables, we proceed in three fully automated steps:
First, instance-based schema matching bridges schematic heterogeneity of the tables by aligning cor-
responding attributes. Next, duplicate detection techniques find multiple representations of identical
real-world objects. Finally, data fusion and conflict resolution merges each duplicate into a single,
consistent, and clean representation.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies

Related categories

User login