SimFusion: measuring similarity using unified relationship matrix

Authors: 
Xi, W; Fox, EA; Fan, W; Zhang, B; Chen, Z; Yan, J; J Yan, D
Author: 
Xi, W
Fox, E
Fan, W
Zhang, B
Chen, Z
Yan, J
J Yan, D
Year: 
2005
Venue: 
Proc. of the 28th annual international ACM SIGIR conf.
URL: 
http://portal.acm.org/citation.cfm?id=1076034.1076059
Citations: 
75
Citations range: 
50 - 99
AttachmentSize
Xi2005SimFusionmeasuringsimilarity.pdf130.63 KB

In this paper we use a Unified Relationship Matrix (URM) to
represent a set of heterogeneous data objects (e.g., web pages,
queries) and their interrelationships (e.g., hyperlinks, user clickthrough
sequences). We claim that iterative computations over the
URM can help overcome the data sparseness problem and detect
latent relationships among heterogeneous data objects, thus, can
improve the quality of information applications that require combination
of information from heterogeneous sources. To support
our claim, we present a unified similarity-calculating algorithm,
SimFusion. By iteratively computing over the URM, SimFusion
can effectively integrate relationships from heterogeneous sources
when measuring the similarity of two data objects. Experiments
based on a web search engine query log and a web page collection
demonstrate that SimFusion can improve similarity measurement
of web objects over both traditional content based algorithms and
the cutting edge SimRank algorithm.