An improved memory management scheme for large scale graph computing engine GraphChi

Yifang Jiang; Diao Zhang; Kai Chen; Qu Zhou; Yi Zhou; Jianhua He

doi:10.1109/BigData.2014.7004357

An improved memory management scheme for large scale graph computing engine GraphChi

Yifang Jiang, Diao Zhang, Kai Chen, Qu Zhou, Yi Zhou, Jianhua He

Adaptive Communications Networks Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.

Original language	English
Title of host publication	Proceedings : 2014 IEEE international conference on Big Data
Editors	Jimmy Lin, Jian Pei, Xiaohua Hu, Wo Chang, Raghunath Nambiar, Charu Aggarwal, Nick Cercone, Vasant Honavar, Jun Huan, Bamshad Mobasher, Saumyadipta Pyne
Publisher	IEEE
Pages	58-63
Number of pages	6
ISBN (Print)	978-1-4799-5665-4
DOIs	https://doi.org/10.1109/BigData.2014.7004357
Publication status	Published - 2015
Event	2nd IEEE International Conference on Big Data - Washington DC, United States Duration: 27 Oct 2014 → 30 Oct 2014

Conference

Conference	2nd IEEE International Conference on Big Data
Abbreviated title	IEEE Big Data 2014
Country/Territory	United States
City	Washington DC
Period	27/10/14 → 30/10/14

Bibliographical note

Funding: National Natural Science Foundation of China (Grant No. 61201384, 61129001), and Shanghai Science and Technology Committees of Scientific Research Project (Grant No. 14DZ1101200).

Keywords

big data
graph process
GraphChi
part-in-memory mode

Access to Document

10.1109/BigData.2014.7004357

Cite this

Jiang, Y., Zhang, D., Chen, K., Zhou, Q., Zhou, Y., & He, J. (2015). An improved memory management scheme for large scale graph computing engine GraphChi. In J. Lin, J. Pei, X. Hu, W. Chang, R. Nambiar, C. Aggarwal, N. Cercone, V. Honavar, J. Huan, B. Mobasher, & S. Pyne (Eds.), Proceedings : 2014 IEEE international conference on Big Data (pp. 58-63). IEEE. https://doi.org/10.1109/BigData.2014.7004357

Jiang, Yifang ; Zhang, Diao ; Chen, Kai et al. / An improved memory management scheme for large scale graph computing engine GraphChi. Proceedings : 2014 IEEE international conference on Big Data. editor / Jimmy Lin ; Jian Pei ; Xiaohua Hu ; Wo Chang ; Raghunath Nambiar ; Charu Aggarwal ; Nick Cercone ; Vasant Honavar ; Jun Huan ; Bamshad Mobasher ; Saumyadipta Pyne. IEEE, 2015. pp. 58-63

@inproceedings{ff762795f6a74f399f7b49fd19dc145b,

title = "An improved memory management scheme for large scale graph computing engine GraphChi",

abstract = "GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.",

keywords = "big data, graph process, GraphChi, part-in-memory mode",

author = "Yifang Jiang and Diao Zhang and Kai Chen and Qu Zhou and Yi Zhou and Jianhua He",

note = "Funding: National Natural Science Foundation of China (Grant No. 61201384, 61129001), and Shanghai Science and Technology Committees of Scientific Research Project (Grant No. 14DZ1101200).; 2nd IEEE International Conference on Big Data, IEEE Big Data 2014 ; Conference date: 27-10-2014 Through 30-10-2014",

year = "2015",

doi = "10.1109/BigData.2014.7004357",

language = "English",

isbn = "978-1-4799-5665-4",

pages = "58--63",

editor = "Jimmy Lin and Jian Pei and Xiaohua Hu and Wo Chang and Raghunath Nambiar and Charu Aggarwal and Nick Cercone and Vasant Honavar and Jun Huan and Bamshad Mobasher and Saumyadipta Pyne",

booktitle = "Proceedings : 2014 IEEE international conference on Big Data",

publisher = "IEEE",

address = "United States",

}

Jiang, Y, Zhang, D, Chen, K, Zhou, Q, Zhou, Y & He, J 2015, An improved memory management scheme for large scale graph computing engine GraphChi. in J Lin, J Pei, X Hu, W Chang, R Nambiar, C Aggarwal, N Cercone, V Honavar, J Huan, B Mobasher & S Pyne (eds), Proceedings : 2014 IEEE international conference on Big Data. IEEE, pp. 58-63, 2nd IEEE International Conference on Big Data, Washington DC, United States, 27/10/14. https://doi.org/10.1109/BigData.2014.7004357

An improved memory management scheme for large scale graph computing engine GraphChi. / Jiang, Yifang; Zhang, Diao; Chen, Kai et al.
Proceedings : 2014 IEEE international conference on Big Data. ed. / Jimmy Lin; Jian Pei; Xiaohua Hu; Wo Chang; Raghunath Nambiar; Charu Aggarwal; Nick Cercone; Vasant Honavar; Jun Huan; Bamshad Mobasher; Saumyadipta Pyne. IEEE, 2015. p. 58-63.

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - An improved memory management scheme for large scale graph computing engine GraphChi

AU - Jiang, Yifang

AU - Zhang, Diao

AU - Chen, Kai

AU - Zhou, Qu

AU - Zhou, Yi

AU - He, Jianhua

N1 - Funding: National Natural Science Foundation of China (Grant No. 61201384, 61129001), and Shanghai Science and Technology Committees of Scientific Research Project (Grant No. 14DZ1101200).

PY - 2015

Y1 - 2015

N2 - GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.

AB - GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.

KW - big data

KW - graph process

KW - GraphChi

KW - part-in-memory mode

UR - http://www.scopus.com/inward/record.url?scp=84921754006&partnerID=8YFLogxK

UR - https://ieeexplore.ieee.org/document/7004357/

U2 - 10.1109/BigData.2014.7004357

DO - 10.1109/BigData.2014.7004357

M3 - Conference publication

AN - SCOPUS:84921754006

SN - 978-1-4799-5665-4

SP - 58

EP - 63

BT - Proceedings : 2014 IEEE international conference on Big Data

A2 - Lin, Jimmy

A2 - Pei, Jian

A2 - Hu, Xiaohua

A2 - Chang, Wo

A2 - Nambiar, Raghunath

A2 - Aggarwal, Charu

A2 - Cercone, Nick

A2 - Honavar, Vasant

A2 - Huan, Jun

A2 - Mobasher, Bamshad

A2 - Pyne, Saumyadipta

PB - IEEE

T2 - 2nd IEEE International Conference on Big Data

Y2 - 27 October 2014 through 30 October 2014

ER -

An improved memory management scheme for large scale graph computing engine GraphChi

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this