Fast all-pairs SimRank assessment on large graphs and bipartite domains

Weiren Yu, Xuemin Lin, Wenjie Zhang, Julie A. McCann

Research output: Contribution to journalArticle

Abstract

SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in O(Kmn) time on a graph of n vertices and m edges, for K iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy ε, the existing SimRank needs K = [log C alterations, where C is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to O(Kd'n2) time, where d' is typically much smaller than mn. (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from O(Kmn) to O(Km'n) time, where m' (≤m) is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than m. Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines
Original languageEnglish
Pages (from-to)1810-1823
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number7
Early online date15 Jul 2014
DOIs
Publication statusPublished - 1 Jul 2015

Fingerprint

Redundancy
Differential equations
Damping

Bibliographical note

© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Keywords

  • structural similarity
  • SimRank
  • hyperlink analysis

Cite this

Yu, Weiren ; Lin, Xuemin ; Zhang, Wenjie ; McCann, Julie A. / Fast all-pairs SimRank assessment on large graphs and bipartite domains. In: IEEE Transactions on Knowledge and Data Engineering. 2015 ; Vol. 27, No. 7. pp. 1810-1823.
@article{f336c40c2b2e45fca78696557d58de54,
title = "Fast all-pairs SimRank assessment on large graphs and bipartite domains",
abstract = "SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in O(Kmn) time on a graph of n vertices and m edges, for K iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy ε, the existing SimRank needs K = [log C alterations, where C is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to O(Kd'n2) time, where d' is typically much smaller than mn. (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from O(Kmn) to O(Km'n) time, where m' (≤m) is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than m. Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines",
keywords = "structural similarity, SimRank, hyperlink analysis",
author = "Weiren Yu and Xuemin Lin and Wenjie Zhang and McCann, {Julie A.}",
note = "{\circledC} 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.",
year = "2015",
month = "7",
day = "1",
doi = "10.1109/TKDE.2014.2339828",
language = "English",
volume = "27",
pages = "1810--1823",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE",
number = "7",

}

Fast all-pairs SimRank assessment on large graphs and bipartite domains. / Yu, Weiren; Lin, Xuemin; Zhang, Wenjie; McCann, Julie A.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 7, 01.07.2015, p. 1810-1823.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Fast all-pairs SimRank assessment on large graphs and bipartite domains

AU - Yu, Weiren

AU - Lin, Xuemin

AU - Zhang, Wenjie

AU - McCann, Julie A.

N1 - © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in O(Kmn) time on a graph of n vertices and m edges, for K iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy ε, the existing SimRank needs K = [log C alterations, where C is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to O(Kd'n2) time, where d' is typically much smaller than mn. (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from O(Kmn) to O(Km'n) time, where m' (≤m) is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than m. Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines

AB - SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in O(Kmn) time on a graph of n vertices and m edges, for K iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy ε, the existing SimRank needs K = [log C alterations, where C is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to O(Kd'n2) time, where d' is typically much smaller than mn. (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from O(Kmn) to O(Km'n) time, where m' (≤m) is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than m. Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines

KW - structural similarity

KW - SimRank

KW - hyperlink analysis

UR - http://ieeexplore.ieee.org/document/6857337/

U2 - 10.1109/TKDE.2014.2339828

DO - 10.1109/TKDE.2014.2339828

M3 - Article

VL - 27

SP - 1810

EP - 1823

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 7

ER -