cs.berkeley.edu

Guided search

Click a term to initiate a search.

Keyword search

Identity uncertainty and citation matching

Wed, 04/11/2007 - 15:20 — cat

Authors:

Pasula, H; Marthi, B; Milch, B; Russell, S; Shpitser, I

Year:

2003

Venue:

Advances in Neural Information Processing (NIPS)

Identity uncertainty is a pervasive problem in real-world data analysis. It
arises whenever objects are not labeled with unique identifiers or when
those identifiers may not be perceived perfectly. In such cases, two observations
may or may not correspond to the same object. In this paper,
we consider the problem in the context of citation matching—the problem
of deciding which citations correspond to the same publication. Our
approach is based on the use of a relational probability model to define
a generative model for the domain, including models of author and title

cs.berkeley.edu

Potters Wheel: An Interactive Framework for Data Cleaning and Transformation

Tue, 09/12/2006 - 15:47 — Anonymous

Authors:

Raman, V; Hellerstein, J

Year:

2001

Venue:

Proc. International Conf. on Very Large Data Bases (VLDB)

Cleaning data of errors in structure and content is important
for data warehousing and integration. Current
solutions for data cleaning involve many iterations of
data “auditing” to find errors, and long-running transformations
to fix them. Users need to endure long
waits, and often write complex transformation scripts.
We present Potter’s Wheel, an interactive data cleaning
system that tightly integrates transformation and
discrepancy detection. Users gradually build transformations
to clean the data by adding or undoing
transforms on a spreadsheet-like interface; the effect

cs.berkeley.edu

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

cs.berkeley.edu

Identity uncertainty and citation matching

Potters Wheel: An Interactive Framework for Data Cleaning and Transformation

User login