Robust and efficient fuzzy match for online data cleaning

Guided search

Click a term to initiate a search.

Keyword search

Robust and efficient fuzzy match for online data cleaning

Wed, 09/13/2006 - 15:09 — cat

Authors:

Chaudhuri, S.; Ganjam, K.; Ganti, V.; Motwani, R.

Author:

Chaudhuri, S

Ganjam, K

Ganti, V

Motwani, R

Year:

2003

Venue:

Proc. ACM SIGMOD 2003

URL:

http://portal.acm.org/citation.cfm?id=872757.872796

Citations:

378

Citations range:

100 - 499

Attachment	Size
Chaudhuri2003Robustandefficientfuzzymatchforonlinedatacleaning.pdf	265.11 KB

To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation.A significant challenge in such a scenario is to implement an efficient and accurate fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. In this paper, we propose a new similarity function which overcomes limitations of commonly used similarity functions, and develop an efficient fuzzy match algorithm. We demonstrate the effectiveness of our techniques by evaluating them on real datasets.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Robust and efficient fuzzy match for online data cleaning

Related categories

User login