AbstractWith the advent of the Internet, graph-structured data are ubiquitous. An essential task for graph-structured data management is similarity search based on graph topology, with a wide spectrum of applications, e.g., web search, outlier detection, co-citation analysis, and collaborative filtering. These graph topology data arrive from multiple sources at an astounding velocity, volume and veracity. While the scale of network structured data is increasing, existing similarity search algorithms on large graphs are impractical due to their expensive costs in terms of computational time and memory space. Moreover, dynamic changes (e.g., noise and abnormality) exists in network data, and it arises from many factors, such as data loss in transfer, data incompleteness, and dirty reading. Thus, the dynamic changes have become the main barrier to gaining accurate results for efficient network analysis.
In real Web applications, CoSimRank has been proposed as a robust measure of node-pair similarity based on graph topology. It follows a SimRank-like notion that “two nodes are considered as similar if their in-neighbours are similar”, but the similarity of each node with itself is not constantly 1, which is different from SimRank. However, existing work on CoSimRank is restricted to static graphs. Each node pair CoSimRank score is retrieved from the sum of dot products of two Personalised PageRank vectors. When the graph is updated with edges (nodes) addition and deletion over time, it is cost-inhibitive to recompute all CoSimRank scores from
scratch, which is impractical. RoleSim is a popular graph-structural role similarity search measure with many applications (e.g., sociometry), it can get the automorphic equivalence of nodes pair similarity, which SimRank and CoSimRank lack. But the accuracy of RoleSim algorithm can be improved. In this study, (1) we propose fast dynamic scheme, D-CoSim and D-deCoSim, for accurate CoSimRank search over large-scale evolving graphs. (2) Based on D-CoSim, we also propose fast scheme, F-CoSim and
Opt_F-CoSim, which greatly accelerates CoSimRank search over static graphs. Our theoretical analysis shows that D-CoSim, D-deCoSim F-CoSim and Opt_F-CoSim guarantee the exactness of CoSimRank scores. Experimental evaluations verify the superiority of D-CoSim and D-deCoSim over evolving graphs, and the fast speedupof F-CoSim and Opt_F-CoSim on large-scale static graphs against its competitors, without any loss of accuracy. (3) We propose a novel role similarity search algorithm FaRS, and a speedup algorithm Opt_FaRS, which guarantees the automorphic equivalence capture, and captures the information from the neighbour’s class. The experimental results of FaRS and Opt_FaRS show that our algorithms achieves higher accuracy than baseline algorithms.
|Date of Award||Dec 2021|
|Supervisor||Hai Wang (Supervisor)|
- Social Network
- SimRank Model
- Role-based similarity
- Web search