Identification of Real-World Objects in Multiple Databases

Guided search

Click a term to initiate a search.

Keyword search

Identification of Real-World Objects in Multiple Databases

Wed, 10/11/2006 - 10:54 — cat

Authors:

Neiling, M

Author:

Neiling, M

Year:

2005

Venue:

TR, TU Berlin

URL:

http://www.cis.cs.tu-berlin.de/~mneiling/publications/Neiling@Gfkl2005.pdf

Citations:

Citations range:

n/a

Attachment	Size
Neiling2005IdentificationofRealWorld.pdf	230.54 KB

Object identification is an important issue for integration of data from
different sources. The identification task is complicated, if no global and consistent
identifier is shared by the sources. Then, object identification can only be performed
through the identifying information, the objects data provides itself. Unfortunately
real-world data is dirty, hence identification mechanisms like natural keys fail mostly
—we have to take care of the variations and errors of the data. Consequently, object
identification can no more be guaranteed to be fault-free. Several methods tackle
the object identification problem, e.g. Record Linkage, or the Sorted Neighborhood
Method.
Based on a novel object identification framework, we assessed data quality and
evaluated different methods on real data. One main result is that scalability is
determined by the applied preselection technique and the usage of efficient data
structures. As another result we can state that Decision Tree Induction achieves
better correctness and is more robust than Record Linkage.

cs.tu-berlin.de

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Identification of Real-World Objects in Multiple Databases

Related categories

User login