Duplicate record elimination in large data files

Guided search

Click a term to initiate a search.

Keyword search

Duplicate record elimination in large data files

Mon, 10/09/2006 - 09:52 — thor

Authors:

Bitton, D.; DeWitt, D.J.

Author:

Bitton, D

DeWitt, D

Year:

1983

Venue:

ACM Transactions on Database Systems (TODS), 8, 1983

URL:

http://portal.acm.org/citation.cfm?id=319987&dl=

Citations:

208

Citations range:

100 - 499

Attachment	Size
Bitton1983Duplicaterecordeliminationin.pdf	727.81 KB

The issue of duplicate elimination for large data files in which many occurrences of the same record may appear is addressed. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this modified merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Duplicate record elimination in large data files

Related categories

User login