Click a term to initiate a search.
The paper analyzes the problem of data cleansing and automatically identifying
potential errors in data sets. An overview of the diminutive amount of existing literature
concerning data cleansing is given. Methods for error detection that go beyond integrity
analysis are reviewed and presented. The applicable methods include: statistical outlier
detection, pattern matching, clustering, and data mining techniques. Some brief results
supporting the use of such methods are given. The future research directions necessary to
address the data cleansing problem are discussed.