Dynamical SimRank search on time-varying networks

Weiren Yu*, Xuemin Lin, Wenjie Zhang, Julie A. McCann

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

Original languageEnglish
Pages (from-to)79–104
Number of pages26
JournalVLDB Journal
Volume27
Issue number1
Early online date27 Nov 2017
DOIs
Publication statusPublished - 1 Feb 2018

Fingerprint

Time varying networks
Data storage equipment
Singular value decomposition
Factorization

Bibliographical note

© The Author(s) 2017. Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Keywords

  • Dynamical networks
  • optimization
  • Similarity search
  • SimRank computation

Cite this

Yu, W., Lin, X., Zhang, W., & McCann, J. A. (2018). Dynamical SimRank search on time-varying networks. VLDB Journal, 27(1), 79–104. https://doi.org/10.1007/s00778-017-0488-z
Yu, Weiren ; Lin, Xuemin ; Zhang, Wenjie ; McCann, Julie A. / Dynamical SimRank search on time-varying networks. In: VLDB Journal. 2018 ; Vol. 27, No. 1. pp. 79–104.
@article{28aebb655ac44aa6875823162195d247,
title = "Dynamical SimRank search on time-varying networks",
abstract = "SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.",
keywords = "Dynamical networks, optimization, Similarity search, SimRank computation",
author = "Weiren Yu and Xuemin Lin and Wenjie Zhang and McCann, {Julie A.}",
note = "{\circledC} The Author(s) 2017. Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.",
year = "2018",
month = "2",
day = "1",
doi = "10.1007/s00778-017-0488-z",
language = "English",
volume = "27",
pages = "79–104",
number = "1",

}

Yu, W, Lin, X, Zhang, W & McCann, JA 2018, 'Dynamical SimRank search on time-varying networks', VLDB Journal, vol. 27, no. 1, pp. 79–104. https://doi.org/10.1007/s00778-017-0488-z

Dynamical SimRank search on time-varying networks. / Yu, Weiren; Lin, Xuemin; Zhang, Wenjie; McCann, Julie A.

In: VLDB Journal, Vol. 27, No. 1, 01.02.2018, p. 79–104.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Dynamical SimRank search on time-varying networks

AU - Yu, Weiren

AU - Lin, Xuemin

AU - Zhang, Wenjie

AU - McCann, Julie A.

N1 - © The Author(s) 2017. Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

PY - 2018/2/1

Y1 - 2018/2/1

N2 - SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

AB - SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

KW - Dynamical networks

KW - optimization

KW - Similarity search

KW - SimRank computation

UR - http://www.scopus.com/inward/record.url?scp=85034626108&partnerID=8YFLogxK

UR - https://link.springer.com/article/10.1007%2Fs00778-017-0488-z

U2 - 10.1007/s00778-017-0488-z

DO - 10.1007/s00778-017-0488-z

M3 - Article

AN - SCOPUS:85034626108

VL - 27

SP - 79

EP - 104

IS - 1

ER -

Yu W, Lin X, Zhang W, McCann JA. Dynamical SimRank search on time-varying networks. VLDB Journal. 2018 Feb 1;27(1):79–104. https://doi.org/10.1007/s00778-017-0488-z