Bioinformatics

Some methods for blindfolded record linkage

Mon, 03/29/2010 - 14:45 — cat

Authors:

Churches, T; Christen, P

Year:

2004

Venue:

BMC Medical Informatics and Decision Making

The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality.

Read more

An Entity Resolution Framework for Deduplicating Proteins

Fri, 02/27/2009 - 11:25 — cat

Authors:

Lochovsky, L; Topaloglou, T

Year:

2008

Venue:

Lecture Notes in Computer Science

An important prerequisite to successfully integrating protein data is detecting duplicate records spread across different databases.

Biological data cleaning: a case study

Sun, 06/10/2007 - 10:58 — cat

Authors:

Herbert, KG; Wang, JTL

Year:

2007

Venue:

International Journal of Information Quality

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardised, methods and frameworks must be developed to handle both structural and traditional data. This paper discusses issues concerning biological data quality with respect to data cleaning.

BIO-AJAX: an extensible framework for biological data cleaning

Wed, 03/21/2007 - 16:27 — cat

Authors:

Herbert, KG; Gehani, NH; Piel, WH; Wang, JTL; Wu, CH

Year:

2004

Venue:

ACM SIGMOD Record

As databases become more pervasive through the biological sciences, various data quality issues regarding data legacy, data uniformity and data duplication arise. Due to the nature of this data, each of these problems is non-trivial. For biological data to be corrected and standardized, new methods and frameworks must be developed. This paper proposes one such framework, called BIO-AJAX, which uses principles from data cleaning to improve data quality in biological information systems, specifically in TreeBASE.

A method for similarity-based grouping of biological data

Tue, 03/20/2007 - 12:28 — cat

Authors:

Jakoniene, V; Rundqvist, D;Lambrix, P

Year:

2006

Venue:

Proc. DILS06, LNCS 4075

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases.

Erkennen und Bereinigen von Datenfehlern in naturwissenschaftlichen Daten

Sat, 10/28/2006 - 14:21 — cat

Authors:

Müller, H; Weis, M; Bleiholder, J; Leser, U

Year:

2005

Venue:

Datenbankspektrum, Vol. 15

Naturwissenschaftliche Daten sind aufgrund
ihres Entstehungsprozesses oft mit
einem hohen Maß an Unsicherheit behaftet.
Bei der Integration von Daten aus verschiedenen
Quellen führen diese Unsicherheiten,
neben der vielfältigen syntaktischen
und semantischen Heterogenität in
der Repräsentation von Daten, zu Konflikten,
die in einer verringerten Qualität des
integrierten Datenbestandes münden. Obwohl
Konflikte oftmals nur durch Domänenexperten
endgültig aufgelöst werden
können, kann und muss die Arbeit dieser
Experten durch geeignete Werkzeuge unterstützt

Febrl - Freely extensible biomedical record linkage

Mon, 10/16/2006 - 14:23 — massmann

Authors:

Christen, Peter; Churches, Tim

Year:

2002

Venue:

ANU Computer Science Technical Reports

This manual describes prototype software called Febrl designed to undertake probabilistic data cleaning (or standardisation) and record linkage. Written in the Python programming language, this software aims to allow health, biomedical and other researchers to clean (standardise) and link data sets of all sizes faster, with less effort and with improved quality.

Read more

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Some methods for blindfolded record linkage

An Entity Resolution Framework for Deduplicating Proteins

Biological data cleaning: a case study

BIO-AJAX: an extensible framework for biological data cleaning

A method for similarity-based grouping of biological data

Erkennen und Bereinigen von Datenfehlern in naturwissenschaftlichen Daten

Febrl - Freely extensible biomedical record linkage

User login