# Graph Random Neural Networks for Semi-Supervised Learning on Graphs

NeurIPS 2020, 2020.

Keywords:

GRAPH RANDOM NEURAL NETWORKSmachine learningMultilayer Perceptionrandom propagationgraph neural networkMore(7+)

Weibo:

Abstract:

We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective fra...More

Code:

Data:

Introduction

- Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web.
- The main idea of GNNs lies in a deterministic feature propagation process to learn expressive node representations.
- Recent studies show that such propagation procedure brings some inherent issues: First, most GNNs suffer from over-smoothing [27, 7, 28, 34].
- A very recent work [34] suggests that the coupled non-linear transformation in the propagation procedure can further aggravate this issue.
- The deterministic propagation makes each node highly dependent with its neighborhoods, leaving the nodes to be misguided by potential data noise and susceptible to adversarial perturbations

Highlights

- Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web
- To effectively augment graph data, we propose random propagation in GRAPH RANDOM NEURAL NETWORKS (GRAND), wherein each node’s features can be randomly dropped either partially or entirely, after which the perturbed feature matrix is propagated over the graph
- GRAND improves upon graph convolutional network (GCN) by a margin of 3.9%, 5.1%, and 3.7% on Cora, Citeseer, and Pubmed, while the margins improved by GAT upon GCN were 1.5%, 2.2%, and 0%, respectively
- We study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND)
- In GRAND, we propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which we utilize consistency regularization to improve the model’s generalization on unlabeled data
- When compared to the very recent regularization based model—DropEdge, the proposed model achieves 2.6%, 3.1%, and 3.1% improvements, while DropEdge’s improvements over GCN were only 1.3%, 2.0%, and 0.6%, respectively
- The simple and effective ideas presented in GRAND may generate a different perspective in graph neural networks (GNNs) design, in particular for semi-supervised graph learning

Methods

**Regularization Methods for GNNs**

Another line of work has aimed to design powerful regularization methods for GNNs, such as VBAT [10], GraphVAT [12], G3NN [29], GraphMix [42], and DropEdge [37].- GraphMix [42] introduces the MixUp strategy [49] for training GNNs. Different from GRAND, GraphMix augments graph data by performing linear interpolation between two samples in the hidden space, and regularizes GNNs by encouraging the model to predict the same interpolation of corresponding labels.
- The idea is to design a propagation strategy (a) to stochastically generate multiple graph data augmentations (b), based on which the authors present a consistency regularized training (c) for improving the generalization capacity under the semi-supervised setting.

Results

- The results of GRAND are averaged over 100 runs with random weight initializations.
- From the top part of Table 1, the authors can observe that GRAND consistently achieves large-margin outperformance over all baselines across all datasets.
- The authors fix K and S to the best values and take a grid search for T and λ from {0.1, 0.2, 0.3,0.5} and {0.5, 0.7, 1.0} respectively.
- For each search of hyperparameter configuration, the authors run the experiments with 20 random seeds and select the best configuration of hyperparameters based on average accuracy on validation set.
- The authors didn’t spend much effort to tune these hyperparameters in practice, as the authors observe that GRAND is not very

Conclusion

- The authors study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND).
- In GRAND, the authors propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which the authors utilize consistency regularization to improve the model’s generalization on unlabeled data.
- The simple and effective ideas presented in GRAND may generate a different perspective in GNN design, in particular for semi-supervised graph learning.
- The authors aim to further improve the scalability of GRAND with some sampling methods

Summary

## Introduction:

Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web.- The main idea of GNNs lies in a deterministic feature propagation process to learn expressive node representations.
- Recent studies show that such propagation procedure brings some inherent issues: First, most GNNs suffer from over-smoothing [27, 7, 28, 34].
- A very recent work [34] suggests that the coupled non-linear transformation in the propagation procedure can further aggravate this issue.
- The deterministic propagation makes each node highly dependent with its neighborhoods, leaving the nodes to be misguided by potential data noise and susceptible to adversarial perturbations
## Objectives:

The authors aim to further improve the scalability of GRAND with some sampling methods. The authors' goal is to predict the corresponding topic of each paper based on feature matrix and citation graph structure.## Methods:

**Regularization Methods for GNNs**

Another line of work has aimed to design powerful regularization methods for GNNs, such as VBAT [10], GraphVAT [12], G3NN [29], GraphMix [42], and DropEdge [37].- GraphMix [42] introduces the MixUp strategy [49] for training GNNs. Different from GRAND, GraphMix augments graph data by performing linear interpolation between two samples in the hidden space, and regularizes GNNs by encouraging the model to predict the same interpolation of corresponding labels.
- The idea is to design a propagation strategy (a) to stochastically generate multiple graph data augmentations (b), based on which the authors present a consistency regularized training (c) for improving the generalization capacity under the semi-supervised setting.
## Results:

The results of GRAND are averaged over 100 runs with random weight initializations.- From the top part of Table 1, the authors can observe that GRAND consistently achieves large-margin outperformance over all baselines across all datasets.
- The authors fix K and S to the best values and take a grid search for T and λ from {0.1, 0.2, 0.3,0.5} and {0.5, 0.7, 1.0} respectively.
- For each search of hyperparameter configuration, the authors run the experiments with 20 random seeds and select the best configuration of hyperparameters based on average accuracy on validation set.
- The authors didn’t spend much effort to tune these hyperparameters in practice, as the authors observe that GRAND is not very
## Conclusion:

The authors study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND).- In GRAND, the authors propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which the authors utilize consistency regularization to improve the model’s generalization on unlabeled data.
- The simple and effective ideas presented in GRAND may generate a different perspective in GNN design, in particular for semi-supervised graph learning.
- The authors aim to further improve the scalability of GRAND with some sampling methods

- Table1: Overall classification accuracy (%)
- Table2: Benchmark Dataset statistics
- Table3: Hyperparameters of GRAND for results in Table 1
- Table4: Statistics of Large Datasets
- Table5: Results on large datasets
- Table6: Classification Accuracy under different label rates (%)

Related work

- Let G = (V, E) denote a graph, where V is a set of |V | = n nodes and E ⊆ V × V is a set of |E| edges between nodes. A ∈ {0, 1}n×n denotes the adjacency matrix of G, with each element Aij = 1 indicating there exists an edge between vi and vj, otherwise Aij = 0.

Semi-Supervised Learning on Graphs. This work focuses on semi-supervised graph learning, in which each node vi is associated with 1) a feature vector Xi ∈ X ∈ Rn×d and 2) a label vector Yi ∈ Y ∈ {0, 1}n×C with C representing the number of classes. For semi-supervised classification, m nodes (0 < m n) have observed their labels YL and the labels YU of the remaining n − m nodes are missing. The objective is to learn a predictive function f : G, X, YL → YU to infer the missing labels YU for unlabeled nodes. Traditional approaches to this problem are mostly based on graph Laplacian regularizations [52, 50, 31, 44, 2]. Recently, graph neural networks (GNNs) have emerged as a powerful approach for semi-supervised graph learning, which are reviewed below.

Funding

- When compared to the very recent regularization based model—DropEdge, the proposed model achieves 2.6%, 3.1%, and 3.1% improvements, while DropEdge’s improvements over GCN were only 1.3%, 2.0%, and 0.6%, respectively

Reference

- Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. Mixhop: Higher-order graph convolution architectures via sparsified neighborhood mixing. ICML’19, 2019.
- Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.
- David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning. NeurIPS’19, 2019.
- Aleksandar Bojchevski and Stephan Günnemann. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In ICLR, 2017.
- Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv:1312.6203, 2013.
- Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning. IEEE Transactions on Neural Networks, 2009.
- Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In AAAI’20, 2020.
- Jie Chen, Tengfei Ma, and Cao Xiao. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247, 2018.
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS’16, 2016.
- Zhijie Deng, Yinpeng Dong, and Jun Zhu. Batch virtual adversarial training for graph convolutional networks. arXiv preprint arXiv:1902.09192, 2019.
- Ming Ding, Jie Tang, and Jie Zhang. Semi-supervised learning on graphs with generative adversarial nets. In CIKM’18, 2018.
- Fuli Feng, Xiangnan He, Jie Tang, and Tat-Seng Chua. Graph adversarial training: Dynamically regularizing based on graph structure. IEEE Transactions on Knowledge and Data Engineering, 2019.
- Hongyang Gao and Shuiwang Ji. Graph u-nets. ICML’19, 2019.
- Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. Graphnas: Graph neural architecture search with reinforcement learning. arXiv preprint arXiv:1904.09981, 2019.
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv:1704.01212, 2017.
- Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS’10, 2010.
- Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In IJCNN’05, 2005.
- Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In NeurIPS’05, 2005.
- Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS’17, pages 1025–1035, 2017.
- Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv:1506.05163, 2015.
- Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. Adaptive sampling towards fast graph representation learning. In NeurIPS’18, 2018.
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15, 2015.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR’14, 2014.
- Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016.
- Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
- Jure Leskovec and Rok Sosic. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1, 2016.
- Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI’18, 2018.
- Leman Akoglu Lingxiao Zhao. Pairnorm: Tackling oversmoothing in gnns. ICLR’20, 2020.
- Jiaqi Ma, Weijing Tang, Ji Zhu, and Qiaozhu Mei. A flexible generative framework for graphbased semi-supervised learning. NeurIPS’19, 2019.
- Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1):415–444, 2001.
- Qiaozhu Mei, Duo Zhang, and ChengXiang Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 611–618, 2008.
- Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018.
- Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR’17, 2017.
- Kenta Oono and Taiji Suzuki. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.
- Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP’14, 2014.
- Meng Qu, Yoshua Bengio, and Jian Tang. Gmnn: Graph markov neural networks. ICML’19, 2019.
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. Dropedge: Towards deep graph convolutional networks on node classification. ICLR’20, 2020.
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
- Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 2014.
- Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. ICLR’18, 2018.
- Vikas Verma, Meng Qu, Alex Lamb, Yoshua Bengio, Juho Kannala, and Jian Tang. Graphmix: Regularized training of graph neural networks for semi-supervised learning, 2019.
- Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. In Advances in neural information processing systems, pages 351–359, 2013.
- Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semisupervised embedding. In Neural networks: Tricks of the trade, pages 639–655.
- Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty, Tao Yu, and Kilian Q Weinberger. Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153, 2019.
- Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848, 2019.
- Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536, 2018.
- Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. ICML’16, 2016.
- Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. ICLR’18, 2018.
- Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In Advances in neural information processing systems, pages 321–328, 2004.
- Dingyuan Zhu, Ziwei Zhang, Peng Cui, and Wenwu Zhu. Robust graph convolutional networks against adversarial attacks. In KDD’19, 2019.
- Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003.
- Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. Layerdependent importance sampling for training deep and large graph convolutional networks. In NeurIPS’19, 2019.
- Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In KDD’18, 2018.
- Daniel Zügner and Stephan Günnemann. Adversarial attacks on graph neural networks via meta learning. ICLR’19, 2019.

Tags

Comments