Frameworks for entity matching: A comparison

Köpcke, H; Rahm, E
Data and Knowledge Engineering, Vol. 69 (2)

Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semiautomatically

Comparative evaluation of entity resolution approaches with FEVER

Köpcke, Hanna; Thor, Andreas; Rahm, Erhard

We present FEVER, a new evaluation platform for entity resolution
approaches. The modular structure of the FEVER framework
supports the incorporation or reconstruction of many previously
proposed approaches for entity resolution. A distinctive feature of
FEVER is that it not only evaluates traditional measures such as
precision and recall but also the effort for configuring (e.g., parameter
tuning, training) a good entity resolution approach. FEVER
thus strives for a fair comparative evaluation of different
approaches by considering both the effectiveness and configuration

Training Selection for Tuning Entity Matching

Köpcke, Hanna; Rahm, Erhard
Proc. VLDB workshop on Quality in Databases and Management of Uncertain Data (QDB/MUD 2008)

Entity matching is a crucial and difficult task for data integration.
An effective solution strategy typically has to combine several
techniques and to find suitable settings for critical configuration
parameters such as similarity thresholds. Supervised (training-based)
approaches promise to reduce the manual work for
determining (learning) effective strategies for entity matching.
However, they critically depend on training data selection which
is a difficult problem that has so far mostly been addressed
manually by human experts. In this paper we propose a training-based

MOMA - A Mapping-based Object Matching System

Thor, A.; Rahm, E.

Object matching or object consolidation is a crucial task for data integration
and data cleaning. It addresses the problem of identifying
object instances in data sources referring to the same real world
entity. We propose a flexible framework called MOMA for mapping-
based object matching. It allows the construction of match
workflows combining the results of several matcher algorithms on
both attribute values and contextual information. The output of a
match task is an instance-level mapping that supports information

AWESOME - a Data Warehouse-based System for Adaptive Website Recommendations

Thor, A.; Rahm, E.
Proc. 30th Intl. Conf. on Very Large Databases (VLDB)

Recommendations are crucial for the success of large websites. While there are many ways to determine recommendations, the relative quality of these recommenders depends on many factors and is largely unknown. We propose a new classification of recommenders and comparatively evaluate their relative quality for a sample website. The evaluation is performed with AWESOME (Adaptive website recommendations), a new data warehouse-based recommendation system capturing and evaluating user feedback on presented recommendations.

Adaptive website recommendations with AWESOME

Thor, A; Golovin, N; Rahm, E
The VLDB Journal The International Journal on Very Large Databases

Recommendations are crucial for the success of large websites. While there are many ways to determine recommendations, the relative quality of these recommenders depends on many factors and is largely unknown. We present the architecture and implementation of AWESOME (Adaptive website recommendations), a data warehouse-based recommendation system. It allows the coordinated use of a large number of recommenders to automatically generate website recommendations.

Citation analysis of database publications

Rahm, E.; Thor, A.
ACM SIGMOD Record, 34, 2005

We analyze citation frequencies for two main database conferences
(SIGMOD, VLDB) and three database journals (TODS, VLDB
Journal, Sigmod Record) over 10 years. The citation data is obtained
by integrating and cleaning data from DBLP and Google Scholar.
Our analysis considers different comparative metrics per
publication venue, in particular the total and average number of citations
as well as the impact factor which has so far only been considered
for journals. We also determine the most cited papers,
authors, author institutions and their countries.

iFuice--Information Fusion utilizing Instance Correspondences and Peer Mappings

Rahm, E.; Thor, A.; Aumueller, D.; Do, H.H.; Golovin, N.; Kirsten, T.
Proc. 8th WebDB, 2005

We present a new approach to information fusion of web data
sources. It is based on peer-to-peer mappings between sources and
utilizes correspondences between their instances. Such correspon-
dences are already available between many sources, e.g. in the
form of web links, and help combine the information about specific objects and support a high quality data fusion. Sources and
mappings relate to a domain model to support a semantically focused information fusion. The iFuice architecture incorporates a
mapping mediator offering both an interactive and a script-driven,

Evaluierung von Data Warehouse-Werkzeugen

Do, H. H.; Stöhr, T.; Rahm, E.; Müller, R.
Proc. Data Warehousing (DW)

Die wachsende Bedeutung von Data Warehouse-Lösungen zur Entscheidungsunterstützung in großen Unternehmen hat zu einer unüber-schaubaren Vielfalt von Software-Produkten geführt. Aktuelle Data Warehouse-Projekte zeigen, daß der Erfolg auch von der Wahl der passenden Werkzeuge für diese komplexe und kostenintensive Umgebung abhängt. Wir präsentieren eine Methode zur Evaluierung von Data Warehouse Tools, die eine Kombination aus Bewertung per Kriterienkatalog und detaillierten praktischen Tests umfaßt.

BioFuice: Mapping-based data integration in bioinformatics

Kirsten, T.; Rahm, E.
Proc. Int. Workshop on Data Integration in the Life Sciences (DILS), 2006

. We introduce the BioFuice approach for integrating data from different private and public data sources and ontologies. BioFuice follows a peer-topeer-like data integration based on bidirectional mappings. Sources and mappings are associated with a domain model to support a semantically meaningful interoperability. BioFuice extends the generic iFuice integration platform which utilizes specific operators for data fusion and workflow-like script programs. BioFuice supports explorative data analysis and query and search capabilities.

Syndicate content