Column Heterogeneity as a Measure of Data Quality

Guided search

Click a term to initiate a search.

Keyword search

Column Heterogeneity as a Measure of Data Quality

Tue, 09/26/2006 - 16:41 — thor

Authors:

Dai, B. T.; Koudas, N.; Ooi, B. C.; Srivastava, D.; Venkatasubramanian, S.

Author:

Dai, B

Koudas, N

Ooi, B

Srivastava, D

Venkatasubramanian, S

Year:

2006

Venue:

Clean DB, 2006

URL:

http://pike.psu.edu/cleandb06/papers/CameraReady_111.pdf

Citations:

Citations range:

1 - 9

Attachment	Size
Dai2006ColumnHeterogeneityasa.pdf	121.02 KB

Data quality is a serious concern in every data management application,
and a variety of quality measures have been proposed, including
accuracy, freshness and completeness, to capture the common
sources of data quality degradation. We identify and focus
attention on a novel measure, column heterogeneity, that seeks to
quantify the data quality problems that can arise when merging data
from different sources. We identify desiderata that a column heterogeneity
measure should intuitively satisfy, and discuss a promising
direction of research to quantify database column heterogeneity
based on using a novel combination of cluster entropy and soft clustering.
Finally, we present a few preliminary experimental results,
using diverse data sets of semantically different types, to demonstrate
that this approach appears to provide a robust mechanism for
identifying and quantifying database column heterogeneity.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Column Heterogeneity as a Measure of Data Quality

Related categories

User login