Linked movie data base

Hassanzadeh, O; Consens, M
Proc 2nd Workshop on Linked Data on the Web

The Linked Movie Database (LinkedMDB) project provides
a demonstration of the first open linked dataset connecting
several major existing (and highly popular) movie web
resources. The database exposed by LinkedMDB contains
millions of RDF triples with hundreds of thousands of RDF
links to existing web data sources that are part of the growing
Linking Open Data cloud, as well as to popular movierelated
web pages such as IMDb. LinkedMDB uses a novel
way of creating and maintaining large quantities of high
quality links by employing state-of-the-art approximate join

Linkage Query Writer

Miller, Renee; Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

We present Linkage Query Writer (LinQuer), a system for
generating SQL queries for semantic link discovery over re-
lational data. The LinQuer framework consists of (a) LinQL,
a language for specification of linkage requirements; (b) a
web interface and an API for translating LinQL queries to
standard SQL queries; (c) an interface that assists users in
writing LinQL queries. We discuss the challenges involved in
the design and implementation of a declarative and easy to
use framework for discovering links between different data

Framework for Evaluating Clustering Algorithms in Duplicate Detection

Hassanzadeh, Oktie; Chiang, Fei; Miller, Renée; Lee, Hyun Chul

The presence of duplicate records is a major data quality concern in
large databases. To detect duplicates, entity resolution also known
as duplication detection or record linkage is used as a part of the
data cleaning process to identify records that potentially refer to
the same real-world entity. We present the Stringer system that
provides an evaluation framework for understanding what barriers
remain towards the goal of truly scalable and general purpose duplication
detection algorithms. In this paper, we use Stringer to

Group Linkage

On, Byung-Won; Koudas, Nick; Lee, Dongwon; Srivastava, Divesh

Poor quality data is prevalent in databases due to a variety
of reasons, including transcription errors, lack of standards
for recording database fields, etc. To be able to query
and integrate such data, considerable recent work has focused
on the record linkage problem, i.e., determine if two
entities represented as relational records are approximately
the same. Often entities are represented as groups of relational
records, rather than individual relational records,
e.g., households in a census survey consist of a group of persons.

Column Heterogeneity as a Measure of Data Quality

Dai, B. T.; Koudas, N.; Ooi, B. C.; Srivastava, D.; Venkatasubramanian, S.
Clean DB, 2006

Data quality is a serious concern in every data management application,
and a variety of quality measures have been proposed, including
accuracy, freshness and completeness, to capture the common
sources of data quality degradation. We identify and focus
attention on a novel measure, column heterogeneity, that seeks to
quantify the data quality problems that can arise when merging data
from different sources. We identify desiderata that a column heterogeneity
measure should intuitively satisfy, and discuss a promising
direction of research to quantify database column heterogeneity

Syndicate content