# Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

KDD, 2019.

EI

Keywords:

clustering deep learning graph convolutional networks large-scale learning semi-supervised learning

Weibo:

Abstract:

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph a...More

Code:

Data:

Introduction

- Graph convolutional network (GCN) [9] has become increasingly popular in addressing many graph-based applications, including semi-supervised node classification [9], link prediction [17] and recommender systems [15].
- GCN uses a graph convolution operation to obtain node embeddings layer by layer—at each layer, the embedding of a node is obtained by gathering the embeddings of its neighbors, followed by one or a few layers of linear transformations and nonlinear activations.
- Where X (l) ∈ RN ×Fl is the embedding at the l-th layer for all the N nodes and X (0) = X ; A′ is the normalized and regularized adjacency matrix and W (l) ∈ RFl ×Fl+1 is the feature transformation matrix which will be learnt for the downstream tasks.
- The activation function σ (·) is usually set to be the element-wise ReLU

Highlights

- Graph convolutional network (GCN) [9] has become increasingly popular in addressing many graph-based applications, including semi-supervised node classification [9], link prediction [17] and recommender systems [15]
- In node classification problems, the final layer embedding is passed to a classifier to predict node labels, and the parameters of Graph convolutional network can be trained in an end-to-end manner
- As shown in Figure 6 Graph convolutional network training on this data can be finished within a few hundreds seconds
- To test the scalability of Graph convolutional network training algorithms, we constructed a much larger graph with over 2 millions of nodes and 61 million edges based on Amazon co-purchasing networks [11, 12]
- We present ClusterGCN, a new Graph convolutional network training algorithm that is fast and memory efficient
- Experimental results show that this method can train very deep Graph convolutional network on large-scale graph, for instance on a graph with over 2 million nodes, the training time is less than an hour using around 2G memory and achieves accuracy of 90.41 (F1

Methods

- The authors evaluate the proposed method for training GCN on two tasks: multi-label and multi-class classification on four public datasets.
- Note that the Reddit dataset is the largest public dataset the authors have seen so far for GCN, and the Amazon2M dataset is collected by themselves and is much larger than Reddit.
- The authors include the following state-of-the-art GCN training algorithms in the comparisons: Datasets PPI Task.
- Amazon multi-label 334,863 925,872 N/A.
- Datasets PPI Reddit Amazon Amazon2M # partitions 50

Results

- By far the largest public data for testing GCN is Reddit dataset with the statistics shown in Table 3, which contains about 200K nodes.
- As shown in Figure 6 GCN training on this data can be finished within a few hundreds seconds.
- To test the scalability of GCN training algorithms, the authors constructed a much larger graph with over 2 millions of nodes and 61 million edges based on Amazon co-purchasing networks [11, 12].
- The detailed statistics of the data set are listed in Table 3

Conclusion

- The authors present ClusterGCN, a new GCN training algorithm that is fast and memory efficient.
- Experimental results show that this method can train very deep GCN on large-scale graph, for instance on a graph with over 2 million nodes, the training time is less than an hour using around 2G memory and achieves accuracy of 90.41 (F1 PPI Reddit.
- The authors are able to successfully train much deeper GCNs, which achieve state-of-the-art test F1 score on PPI and Reddit datasets

Summary

## Introduction:

Graph convolutional network (GCN) [9] has become increasingly popular in addressing many graph-based applications, including semi-supervised node classification [9], link prediction [17] and recommender systems [15].- GCN uses a graph convolution operation to obtain node embeddings layer by layer—at each layer, the embedding of a node is obtained by gathering the embeddings of its neighbors, followed by one or a few layers of linear transformations and nonlinear activations.
- Where X (l) ∈ RN ×Fl is the embedding at the l-th layer for all the N nodes and X (0) = X ; A′ is the normalized and regularized adjacency matrix and W (l) ∈ RFl ×Fl+1 is the feature transformation matrix which will be learnt for the downstream tasks.
- The activation function σ (·) is usually set to be the element-wise ReLU
## Methods:

The authors evaluate the proposed method for training GCN on two tasks: multi-label and multi-class classification on four public datasets.- Note that the Reddit dataset is the largest public dataset the authors have seen so far for GCN, and the Amazon2M dataset is collected by themselves and is much larger than Reddit.
- The authors include the following state-of-the-art GCN training algorithms in the comparisons: Datasets PPI Task.
- Amazon multi-label 334,863 925,872 N/A.
- Datasets PPI Reddit Amazon Amazon2M # partitions 50
## Results:

By far the largest public data for testing GCN is Reddit dataset with the statistics shown in Table 3, which contains about 200K nodes.- As shown in Figure 6 GCN training on this data can be finished within a few hundreds seconds.
- To test the scalability of GCN training algorithms, the authors constructed a much larger graph with over 2 millions of nodes and 61 million edges based on Amazon co-purchasing networks [11, 12].
- The detailed statistics of the data set are listed in Table 3
## Conclusion:

The authors present ClusterGCN, a new GCN training algorithm that is fast and memory efficient.- Experimental results show that this method can train very deep GCN on large-scale graph, for instance on a graph with over 2 million nodes, the training time is less than an hour using around 2G memory and achieves accuracy of 90.41 (F1 PPI Reddit.
- The authors are able to successfully train much deeper GCNs, which achieve state-of-the-art test F1 score on PPI and Reddit datasets

- Table1: Time and space complexity of GCN training algorithms. L is number of layers, N is number of nodes, ∥A∥0 is number of nonzeros in the adjacency matrix, and F is number of features. For simplicity we assume number of features is fixed for all layers. For SGD-based approaches, b is the batch size and r is the number of sampled neighbors per node. Note that due to the variance reduction technique, VR-GCN can work with a smaller r than GraphSAGE and FastGCN. For memory complexity, LF 2 is for storing {W (l)}lL=1 and the other term is for storing embeddings. For simplicity we omit the memory for storing the graph (GCN) or sub-graphs (other approaches) since they are fixed and usually not the main bottleneck
- Table2: Random partition versus clustering partition of the graph (trained on mini-batch SGD). Clustering partition leads to better performance (in terms of test F1 score) since it removes less between-partition links. These three datasetes are all public GCN datasets. We will explain PPI data in the experiment part. Cora has 2,708 nodes and 13,264 edges, and Pubmed has 19,717 nodes and 108,365 edges
- Table3: Data statistics
- Table4: The parameters used in the experiments
- Table5: Comparisons of memory usages on different datasets. Numbers in the brackets indicate the size of hidden units used in the model
- Table6: Benchmarking on the Sparse Tensor operations in PyTorch and TensorFlow. A network with two linear layers is used and the timing includes forward and backward operations. Numbers in the brackets indicate the size of hidden units in the first layer. Amazon data is used
- Table7: The most common categories in Amazon2M
- Table8: Comparisons of running time, memory and testing accuracy (F1 score) for Amazon2M
- Table9: Comparisons of running time when using different numbers of GCN layers. We use PPI and run both methods for 200 epochs
- Table10: State-of-the-art performance of testing accuracy reported in recent papers
- Table11: Comparisons of using different diagonal enhancement techniques. For all methods, we present the best validation accuracy achieved in 200 epochs. PPI is used and dropout rate is 0.1 in this experiment. Other settings are the same as in Section 4.1. The numbers marked red indicate poor convergence
- Table12: The training, validation, and test splits used in the experiments. Note that for the two amazon datasets we only split into training and test sets
- Table13: The running time of graph clustering algorithm (METIS) and data preprocessing before the training of GCN

Reference

- Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In ICLR.
- Jianfei Chen, Jun Zhu, and Song Le. 2018. Stochastic Training of Graph Convolutional Networks with Variance Reduction. In ICML.
- Hanjun Dai, Zornitsa Kozareva, Bo Dai, Alex Smola, and Le Song. 2018. Learning Steady-States of Iterative Algorithms over Graphs. In ICML. 1114–1122.
- Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted Graph Cuts Without Eigenvectors A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 1944–1957.
- William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 201Deep Residual Learning for Image Recognition. CVPR (2016), 770–778.
- H. Hotelling. 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 6 (1933), 417–441.
- George Karypis and Vipin Kumar. 199A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359–392.
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
- Ziqi Liu, Chaochao Chen, Longfei Li, Jun Zhou, Xiaolong Li, Le Song, and Yuan Qi. 2019. GeniePath: Graph Neural Networks with Adaptive Receptive Paths. In AAAI.
- Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring Networks of Substitutable and Complementary Products. In KDD.
- Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In SIGIR.
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. (2018).
- Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD.
- Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit-Yan Yeung.
- 2018. GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs. In UAI.
- [17] Muhan Zhang and Yixin Chen. 20Link Prediction Based on Graph Neural Networks. In NIPS.

Tags

Comments