Tagging of name records for genealogical data browsing

Guided search

Click a term to initiate a search.

Keyword search

Tagging of name records for genealogical data browsing

Thu, 01/08/2009 - 10:42 — cat

Authors:

Perrow, Mike; Barber, David

Author:

Perrow, M

Barber, D

Year:

2008

Venue:

Proc. 6th ACM/IEEE-CS joint conference on Digital libraries

URL:

http://portal.acm.org/citation.cfm?id=1141753.1141827

DOI:

10.1145/1141753.1141827

Citations:

Citations range:

1 - 9

Attachment	Size
Perrow2008Taggingofnamerecordsforgenealogicaldatabrowsing.pdf	890.95 KB

In this paper we present a method of parsing unstructured textual records briefly describing a person and their direct relatives, which we use in the construction of a browsing tool for genealogical data. The records have been created by researchers who are currently digitising a collection of historical archives stored at the Abbaye de Saint-Maurice, Switzerland. The string 'Beatrix, daughter of Johannes Trona, of Saillon' is a typical example of a record. We wish to annotate every term (word and symbol) in our records with a label which describes whether the term is a name (e.g. 'Beatrix'), a place (e.g. 'Saillon'), or a relationship (e.g. 'daughter'). Using this information, we are able to derive both a canonical form for each name (e.g. 'Beatrix Trona'), and the relationships between people. We build upon work developed for the cleaning and standardization of names for record linkage corpora, adding several enhancements to deal with our more difficult data, which contains common name structures of French, Italian and Latin, over hundreds of years. We present an approach to this problem that works interactively with a user to annotate the data set accurately, greatly reducing the human effort required. We do this by learning a Hidden Markov Model representing a record structure, and finding structural patterns in new records. Finally, we present a brief overview of a tool we are developing to help genealogical researchers browse and search the data.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Tagging of name records for genealogical data browsing

Related categories

User login