cs.columbia.edu

Guided search

Click a term to initiate a search.

Keyword search

Text joins in an RDBMS for web data integration

Mon, 10/09/2006 - 12:46 — thor

Authors:

Gravano, L.; Ipeirotis, P.G.; Koudas, N.; Srivastava, D.

Year:

2003

Venue:

Proceedings of the twelfth international conference on World Wide Web, 2003

The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identifiers, the same entity (e.g., a product) might have different textual representations across databases. Textual data is also often noisy because of transcription errors, incomplete information, and lack of standard formats. A fundamental task during data integration is matching of strings that refer to the same entity.

Approximate string joins in a database (almost) for free

Mon, 10/09/2006 - 12:43 — thor

Authors:

Gravano, L.; Ipeirotis, P.G.; Jagadish, H.V.; Koudas, N.; Muthukrishnan, S.; Srivastava, D.

Year:

2001

Venue:

Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), 2001

String data is ubiquitous, and its management has
taken on particular importance in the past few
years. Approximate queries are very important on
string data especially for more complex queries
involving joins. This is due, for example, to the
prevalence of typographical errors in data, and
multiple conventions for recording attributes such
as name and address. Commercial databases do
not support approximate string joins directly, and
it is a challenge to implement this functionality efficiently
with user-defined functions (UDFs).
In this paper, we develop a technique for building

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

cs.columbia.edu

Text joins in an RDBMS for web data integration

Approximate string joins in a database (almost) for free

User login