Febrl - A freely available record linkage system with a graphical user interface

Christen, Peter
Christen, P
Australasian Workshop Health Data and Knowledge Management
Record or data linkage is an important enabling tech-
nology in the health sector, as linked data is a cost-
effective resource that can help to improve research
into health policies, detect adverse drug reactions, re-
duce costs, and uncover fraud within the health sys-
tem. Significant advances, mostly originating from
data mining and machine learning, have been made
in recent years in many areas of record linkage tech-
niques. Most of these new methods are not yet im-
plemented in current record linkage systems, or are
hidden within ‘black box’ commercial software. This
makes it difficult for users to learn about new record
linkage techniques, as well as to compare existing link-
age techniques with new ones. What is required are
flexible tools that enable users to experiment with
new record linkage techniques at low costs.
This paper describes the Febrl (Freely Extensi-
ble Biomedical Record Linkage) system, which is
available under an open source software licence. It
contains many recently developed advanced tech-
niques for data cleaning and standardisation, index-
ing (blocking), field comparison, and record pair clas-
sification, and encapsulates them into a graphical user
interface. Febrl can be seen as a training tool suit-
able for users to learn and experiment with both tra-
ditional and new record linkage techniques, as well as
for practitioners to conduct linkages with data sets
containing up to several hundred thousand records.