Exploiting relationships for object consolidation

Guided search

Click a term to initiate a search.

Keyword search

Exploiting relationships for object consolidation

Wed, 09/13/2006 - 15:15 — Anonymous

Authors:

Chen, Z; Kalashnikov, DV; Mehrotra, S

Author:

Chen, Z

Kalashnikov, D

Mehrotra, S

Year:

2005

Venue:

Proceedings of the 2nd international workshop on Information

URL:

http://portal.acm.org/citation.cfm?id=1077512

Citations:

Citations range:

50 - 99

Attachment	Size
Chen2005Exploitingrelationshipsfor.pdf	588.78 KB

Data mining practitioners frequently have to spend significant portion of their project time on data preprocessing before they can apply their algorithms on real-world datasets. Such a preprocessing is required because many real-world datasets are not perfect, but rather they contain missing, erroneous, duplicate data and other data cleaning problems. It is a well established fact that, in general, if such problems with data are not corrected, applying data mining algorithm can lead to wrong results. The latter is known as the \"garbage in, garbage out\" principle. Given the significance of the problem, numerous data cleaning techniques have been designed in the past to address the aforementioned problems with data.In this paper, we address one of the data cleaning challenges, called object consolidation. This important challenge arises because objects in datasets are frequently represented via descriptions (a set of instantiated attributes), which alone might not always uniquely identify the object. The goal of object consolidation is to correctly consolidate (i.e., to group/determine) all the representations of the same object, for each object in the dataset. In contrast to traditional domain-independent data cleaning techniques, our approach analyzes not only object features, but also additional semantic information: inter-objects relationships, for the purpose of object consolidation. The approach views datasets as attributed relational graphs (ARGs) of object representations (nodes), connected via relationships (edges). The approach then applies graph partitioning techniques to accurately cluster object representations. Our empirical study over real datasets shows that analyzing relationships significantly improves the quality of the result.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Exploiting relationships for object consolidation

Related categories

User login