Graph Random Neural Networks for Semi-Supervised Learning on Graphs

NeurIPS 2020, 2020.

Cited by: 0|Bibtex|Views84|Links
Keywords:
GRAPH RANDOM NEURAL NETWORKSmachine learningMultilayer Perceptionrandom propagationgraph neural networkMore(7+)
Weibo:
We study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS

Abstract:

We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective fra...More

Code:

Data:

0
Introduction
  • Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web.
  • The main idea of GNNs lies in a deterministic feature propagation process to learn expressive node representations.
  • Recent studies show that such propagation procedure brings some inherent issues: First, most GNNs suffer from over-smoothing [27, 7, 28, 34].
  • A very recent work [34] suggests that the coupled non-linear transformation in the propagation procedure can further aggravate this issue.
  • The deterministic propagation makes each node highly dependent with its neighborhoods, leaving the nodes to be misguided by potential data noise and susceptible to adversarial perturbations
Highlights
  • Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web
  • To effectively augment graph data, we propose random propagation in GRAPH RANDOM NEURAL NETWORKS (GRAND), wherein each node’s features can be randomly dropped either partially or entirely, after which the perturbed feature matrix is propagated over the graph
  • GRAND improves upon graph convolutional network (GCN) by a margin of 3.9%, 5.1%, and 3.7% on Cora, Citeseer, and Pubmed, while the margins improved by GAT upon GCN were 1.5%, 2.2%, and 0%, respectively
  • We study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND)
  • In GRAND, we propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which we utilize consistency regularization to improve the model’s generalization on unlabeled data
  • When compared to the very recent regularization based model—DropEdge, the proposed model achieves 2.6%, 3.1%, and 3.1% improvements, while DropEdge’s improvements over GCN were only 1.3%, 2.0%, and 0.6%, respectively
  • The simple and effective ideas presented in GRAND may generate a different perspective in graph neural networks (GNNs) design, in particular for semi-supervised graph learning
Methods
  • Regularization Methods for GNNs

    Another line of work has aimed to design powerful regularization methods for GNNs, such as VBAT [10], GraphVAT [12], G3NN [29], GraphMix [42], and DropEdge [37].
  • GraphMix [42] introduces the MixUp strategy [49] for training GNNs. Different from GRAND, GraphMix augments graph data by performing linear interpolation between two samples in the hidden space, and regularizes GNNs by encouraging the model to predict the same interpolation of corresponding labels.
  • The idea is to design a propagation strategy (a) to stochastically generate multiple graph data augmentations (b), based on which the authors present a consistency regularized training (c) for improving the generalization capacity under the semi-supervised setting.
Results
  • The results of GRAND are averaged over 100 runs with random weight initializations.
  • From the top part of Table 1, the authors can observe that GRAND consistently achieves large-margin outperformance over all baselines across all datasets.
  • The authors fix K and S to the best values and take a grid search for T and λ from {0.1, 0.2, 0.3,0.5} and {0.5, 0.7, 1.0} respectively.
  • For each search of hyperparameter configuration, the authors run the experiments with 20 random seeds and select the best configuration of hyperparameters based on average accuracy on validation set.
  • The authors didn’t spend much effort to tune these hyperparameters in practice, as the authors observe that GRAND is not very
Conclusion
  • The authors study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND).
  • In GRAND, the authors propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which the authors utilize consistency regularization to improve the model’s generalization on unlabeled data.
  • The simple and effective ideas presented in GRAND may generate a different perspective in GNN design, in particular for semi-supervised graph learning.
  • The authors aim to further improve the scalability of GRAND with some sampling methods
Summary
  • Introduction:

    Graphs serve as a common language for modeling structured and relational data [26], such as social networks, knowledge graphs, and the World Wide Web.
  • The main idea of GNNs lies in a deterministic feature propagation process to learn expressive node representations.
  • Recent studies show that such propagation procedure brings some inherent issues: First, most GNNs suffer from over-smoothing [27, 7, 28, 34].
  • A very recent work [34] suggests that the coupled non-linear transformation in the propagation procedure can further aggravate this issue.
  • The deterministic propagation makes each node highly dependent with its neighborhoods, leaving the nodes to be misguided by potential data noise and susceptible to adversarial perturbations
  • Objectives:

    The authors aim to further improve the scalability of GRAND with some sampling methods. The authors' goal is to predict the corresponding topic of each paper based on feature matrix and citation graph structure.
  • Methods:

    Regularization Methods for GNNs

    Another line of work has aimed to design powerful regularization methods for GNNs, such as VBAT [10], GraphVAT [12], G3NN [29], GraphMix [42], and DropEdge [37].
  • GraphMix [42] introduces the MixUp strategy [49] for training GNNs. Different from GRAND, GraphMix augments graph data by performing linear interpolation between two samples in the hidden space, and regularizes GNNs by encouraging the model to predict the same interpolation of corresponding labels.
  • The idea is to design a propagation strategy (a) to stochastically generate multiple graph data augmentations (b), based on which the authors present a consistency regularized training (c) for improving the generalization capacity under the semi-supervised setting.
  • Results:

    The results of GRAND are averaged over 100 runs with random weight initializations.
  • From the top part of Table 1, the authors can observe that GRAND consistently achieves large-margin outperformance over all baselines across all datasets.
  • The authors fix K and S to the best values and take a grid search for T and λ from {0.1, 0.2, 0.3,0.5} and {0.5, 0.7, 1.0} respectively.
  • For each search of hyperparameter configuration, the authors run the experiments with 20 random seeds and select the best configuration of hyperparameters based on average accuracy on validation set.
  • The authors didn’t spend much effort to tune these hyperparameters in practice, as the authors observe that GRAND is not very
  • Conclusion:

    The authors study the problem of semi-supervised learning on graphs and present the GRAPH RANDOM NEURAL NETWORKS (GRAND).
  • In GRAND, the authors propose the random propagation strategy to stochastically generate multiple graph data augmentations, based on which the authors utilize consistency regularization to improve the model’s generalization on unlabeled data.
  • The simple and effective ideas presented in GRAND may generate a different perspective in GNN design, in particular for semi-supervised graph learning.
  • The authors aim to further improve the scalability of GRAND with some sampling methods
Tables
  • Table1: Overall classification accuracy (%)
  • Table2: Benchmark Dataset statistics
  • Table3: Hyperparameters of GRAND for results in Table 1
  • Table4: Statistics of Large Datasets
  • Table5: Results on large datasets
  • Table6: Classification Accuracy under different label rates (%)
Related work
  • Let G = (V, E) denote a graph, where V is a set of |V | = n nodes and E ⊆ V × V is a set of |E| edges between nodes. A ∈ {0, 1}n×n denotes the adjacency matrix of G, with each element Aij = 1 indicating there exists an edge between vi and vj, otherwise Aij = 0.

    Semi-Supervised Learning on Graphs. This work focuses on semi-supervised graph learning, in which each node vi is associated with 1) a feature vector Xi ∈ X ∈ Rn×d and 2) a label vector Yi ∈ Y ∈ {0, 1}n×C with C representing the number of classes. For semi-supervised classification, m nodes (0 < m n) have observed their labels YL and the labels YU of the remaining n − m nodes are missing. The objective is to learn a predictive function f : G, X, YL → YU to infer the missing labels YU for unlabeled nodes. Traditional approaches to this problem are mostly based on graph Laplacian regularizations [52, 50, 31, 44, 2]. Recently, graph neural networks (GNNs) have emerged as a powerful approach for semi-supervised graph learning, which are reviewed below.
Funding
  • When compared to the very recent regularization based model—DropEdge, the proposed model achieves 2.6%, 3.1%, and 3.1% improvements, while DropEdge’s improvements over GCN were only 1.3%, 2.0%, and 0.6%, respectively
Reference
  • Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. Mixhop: Higher-order graph convolution architectures via sparsified neighborhood mixing. ICML’19, 2019.
    Google ScholarLocate open access versionFindings
  • Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning. NeurIPS’19, 2019.
    Google ScholarLocate open access versionFindings
  • Aleksandar Bojchevski and Stephan Günnemann. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv:1312.6203, 2013.
    Findings
  • Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning. IEEE Transactions on Neural Networks, 2009.
    Google ScholarLocate open access versionFindings
  • Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In AAAI’20, 2020.
    Google ScholarLocate open access versionFindings
  • Jie Chen, Tengfei Ma, and Cao Xiao. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247, 2018.
    Findings
  • Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS’16, 2016.
    Google ScholarLocate open access versionFindings
  • Zhijie Deng, Yinpeng Dong, and Jun Zhu. Batch virtual adversarial training for graph convolutional networks. arXiv preprint arXiv:1902.09192, 2019.
    Findings
  • Ming Ding, Jie Tang, and Jie Zhang. Semi-supervised learning on graphs with generative adversarial nets. In CIKM’18, 2018.
    Google ScholarLocate open access versionFindings
  • Fuli Feng, Xiangnan He, Jie Tang, and Tat-Seng Chua. Graph adversarial training: Dynamically regularizing based on graph structure. IEEE Transactions on Knowledge and Data Engineering, 2019.
    Google ScholarLocate open access versionFindings
  • Hongyang Gao and Shuiwang Ji. Graph u-nets. ICML’19, 2019.
    Google ScholarLocate open access versionFindings
  • Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. Graphnas: Graph neural architecture search with reinforcement learning. arXiv preprint arXiv:1904.09981, 2019.
    Findings
  • Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv:1704.01212, 2017.
    Findings
  • Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS’10, 2010.
    Google ScholarLocate open access versionFindings
  • Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In IJCNN’05, 2005.
    Google ScholarLocate open access versionFindings
  • Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In NeurIPS’05, 2005.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS’17, pages 1025–1035, 2017.
    Google ScholarLocate open access versionFindings
  • Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv:1506.05163, 2015.
    Findings
  • Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. Adaptive sampling towards fast graph representation learning. In NeurIPS’18, 2018.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15, 2015.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR’14, 2014.
    Google ScholarFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016.
    Findings
  • Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
    Findings
  • Jure Leskovec and Rok Sosic. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1, 2016.
    Google ScholarLocate open access versionFindings
  • Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI’18, 2018.
    Google ScholarLocate open access versionFindings
  • Leman Akoglu Lingxiao Zhao. Pairnorm: Tackling oversmoothing in gnns. ICLR’20, 2020.
    Google ScholarFindings
  • Jiaqi Ma, Weijing Tang, Ji Zhu, and Qiaozhu Mei. A flexible generative framework for graphbased semi-supervised learning. NeurIPS’19, 2019.
    Google ScholarLocate open access versionFindings
  • Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1):415–444, 2001.
    Google ScholarLocate open access versionFindings
  • Qiaozhu Mei, Duo Zhang, and ChengXiang Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 611–618, 2008.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018.
    Google ScholarLocate open access versionFindings
  • Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR’17, 2017.
    Google ScholarLocate open access versionFindings
  • Kenta Oono and Taiji Suzuki. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP’14, 2014.
    Google ScholarLocate open access versionFindings
  • Meng Qu, Yoshua Bengio, and Jian Tang. Gmnn: Graph markov neural networks. ICML’19, 2019.
    Google ScholarLocate open access versionFindings
  • Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. Dropedge: Towards deep graph convolutional networks on node classification. ICLR’20, 2020.
    Google ScholarFindings
  • Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
    Google ScholarLocate open access versionFindings
  • Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
    Findings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 2014.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. ICLR’18, 2018.
    Google ScholarFindings
  • Vikas Verma, Meng Qu, Alex Lamb, Yoshua Bengio, Juho Kannala, and Jian Tang. Graphmix: Regularized training of graph neural networks for semi-supervised learning, 2019.
    Google ScholarFindings
  • Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. In Advances in neural information processing systems, pages 351–359, 2013.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semisupervised embedding. In Neural networks: Tricks of the trade, pages 639–655.
    Google ScholarLocate open access versionFindings
  • Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty, Tao Yu, and Kilian Q Weinberger. Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153, 2019.
    Findings
  • Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848, 2019.
    Findings
  • Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536, 2018.
    Findings
  • Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. ICML’16, 2016.
    Google ScholarFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. ICLR’18, 2018.
    Google ScholarFindings
  • Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In Advances in neural information processing systems, pages 321–328, 2004.
    Google ScholarLocate open access versionFindings
  • Dingyuan Zhu, Ziwei Zhang, Peng Cui, and Wenwu Zhu. Robust graph convolutional networks against adversarial attacks. In KDD’19, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003.
    Google ScholarLocate open access versionFindings
  • Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. Layerdependent importance sampling for training deep and large graph convolutional networks. In NeurIPS’19, 2019.
    Google ScholarLocate open access versionFindings
  • Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In KDD’18, 2018.
    Google ScholarLocate open access versionFindings
  • Daniel Zügner and Stephan Günnemann. Adversarial attacks on graph neural networks via meta learning. ICLR’19, 2019.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments