Self-supervised Learning on Graphs: Deep Insights and New Direction

Cited by: 2|Bibtex|Views174|Links
Keywords:
unlabeled nodereal worldssl pretextpretext taskgraph domainMore(21+)
Weibo:
We first introduce various basic supervised learning pretext tasks for graphs and present detailed empirical study to understand when and why SSL works for graph neural networks and which strategy can better work with GNNs

Abstract:

The success of deep learning notoriously requires larger amounts of costly annotated data. This has led to the development of self-supervised learning (SSL) that aims to alleviate this limitation by creating domain specific pretext tasks on unlabeled data. Simultaneously, there are increasing interests in generalizing deep learning to t...More

Code:

Data:

0
Introduction
  • Deep learning has achieved superior performance across numerous domains; but it requires costly annotations of huge amounts of data [1].
  • SSL often first designs a domain specific pretext task to assign labels for data instances and trains the deep model on the pretext task to learn better representations due to the inclusion of unlabeled samples in the training process.
  • GNNs are inherently semi-supervised where unlabeled data has been coherently integrated.
  • To fully exploit the unlabeled nodes for GNNs, SSL can be naturally harnessed for providing additional supervision
Highlights
  • In recent years, deep learning has achieved superior performance across numerous domains; but it requires costly annotations of huge amounts of data [1]
  • There are a variety of potential pretext tasks for graphs; it is important to gain insights on when and why supervised learning (SSL) works for graph neural networks (GNNs) and which strategy can better integrate SSL for GNNs
  • Given a dataset in the graph domain represented as a graph G = (A, X) with paired labeled data DL = (VL, YL), we aim to construct a self-supervised pretext task with a corresponding loss Lself that can be integrated with the task specific loss Ltask to learn a graph neural network fθ that can better generalize on the unlabeled data
  • To facilitate this line of research, we have carefully studied SSL in GNNs for the task of node classification
  • We first introduce various basic SSL pretext tasks for graphs and present detailed empirical study to understand when and why SSL works for GNNs and which strategy can better work with GNNs
  • Extensive experiments on real-world datasets demonstrate that our advanced method achieves state-of-the-art performance
Methods
  • The authors evaluate the effectiveness of the proposed SelfTask pretext tasks presented in Section 5.
  • To validate the proposed approaches, the authors conduct experiments on four benchmark datasets, including Cora, Citeseer and Pubmed [7] shown in Table 1, and Reddit [8].
  • The hyper-parameters of all the models are tuned based on the loss and accuracy on the validation set.
  • In addition to the vanilla 2-layer GCN [7], the authors include two recent SSL methods on graph neural networks as baselines – (1) Self-Training [21]: it first trains a graph neural network and adds the most confident predictions of unlabeled data to the label set as pseudo-labels for later training; and (2) M3S [10]: it repeatedly assigns pseudo-labels and trains on augmented labeled set for K times where it employs DeepCluster [3] and Self-Training to perform self-checking based on the generated pseudo-labels
Results
  • Extensive experiments on real-world datasets demonstrate that the advanced method achieves state-of-the-art performance.
Conclusion
  • Applying self-supervised learning to GNNs is a cutting-edge research topic with great potential.
  • To facilitate this line of research, the authors have carefully studied SSL in GNNs for the task of node classification.
  • The authors first introduce various basic SSL pretext tasks for graphs and present detailed empirical study to understand when and why SSL works for GNNs and which strategy can better work with GNNs. based on the insights, the authors propose a new direction SelfTask to build advanced pretext tasks which further exploit task-specific self-supervised information.
  • Future work can be done on exploring new pretext tasks and applying the proposed SSL strategies in pre-training graph neural networks
Summary
  • Introduction:

    Deep learning has achieved superior performance across numerous domains; but it requires costly annotations of huge amounts of data [1].
  • SSL often first designs a domain specific pretext task to assign labels for data instances and trains the deep model on the pretext task to learn better representations due to the inclusion of unlabeled samples in the training process.
  • GNNs are inherently semi-supervised where unlabeled data has been coherently integrated.
  • To fully exploit the unlabeled nodes for GNNs, SSL can be naturally harnessed for providing additional supervision
  • Objectives:

    Given a dataset in the graph domain represented as a graph G = (A, X) with paired labeled data DL = (VL, YL), the authors aim to construct a self-supervised pretext task with a corresponding loss Lself that can be integrated with the task specific loss Ltask to learn a graph neural network fθ that can better generalize on the unlabeled data.
  • The authors further develop the PairwiseDistance where the authors aim to guide the graph neural network to maintain global topology information through a pairwise comparison.
  • The authors aim to try a new direction beyond the structure and attribute information where the authors want to take into consideration the specific downstream task
  • Methods:

    The authors evaluate the effectiveness of the proposed SelfTask pretext tasks presented in Section 5.
  • To validate the proposed approaches, the authors conduct experiments on four benchmark datasets, including Cora, Citeseer and Pubmed [7] shown in Table 1, and Reddit [8].
  • The hyper-parameters of all the models are tuned based on the loss and accuracy on the validation set.
  • In addition to the vanilla 2-layer GCN [7], the authors include two recent SSL methods on graph neural networks as baselines – (1) Self-Training [21]: it first trains a graph neural network and adds the most confident predictions of unlabeled data to the label set as pseudo-labels for later training; and (2) M3S [10]: it repeatedly assigns pseudo-labels and trains on augmented labeled set for K times where it employs DeepCluster [3] and Self-Training to perform self-checking based on the generated pseudo-labels
  • Results:

    Extensive experiments on real-world datasets demonstrate that the advanced method achieves state-of-the-art performance.
  • Conclusion:

    Applying self-supervised learning to GNNs is a cutting-edge research topic with great potential.
  • To facilitate this line of research, the authors have carefully studied SSL in GNNs for the task of node classification.
  • The authors first introduce various basic SSL pretext tasks for graphs and present detailed empirical study to understand when and why SSL works for GNNs and which strategy can better work with GNNs. based on the insights, the authors propose a new direction SelfTask to build advanced pretext tasks which further exploit task-specific self-supervised information.
  • Future work can be done on exploring new pretext tasks and applying the proposed SSL strategies in pre-training graph neural networks
Tables
  • Table1: Dataset statistics
  • Table2: Two-stage training strategies on Cora
  • Table3: Performance evaluation of using SSL for GNNs
  • Table4: Node classification performance accuracy (%) of integrating SSL into GNNs
Download tables as Excel
Related work
  • In this section, we introduce the related work including self-supervised learning and graph neural networks.

    7.1 Self-supervised Learning

    SSL is a novel learning framework that generates additional supervised signals to train deep learning models through carefully designed pretext tasks. SSL has been proven to effectively alleviate the problem of lack of labeled training data [1]. In the image domain, various self-supervised learning techniques have been developed for learning high-level image representations. Doersch et al [2] first proposed to predict the relative locations of image patches. Following this line of research, Noroozi et al [22] designed a pretext task called Jigsaw Puzzle. More types of pretext tasks have also been investigated, such as image rotation [23], image clustering [3], image inpainting [24], image colorization [25] and motion segmentation prediction [26]. In the domain of graphs, there are a few works incorporating SSL. Sun et al [10] utilized the clustering assignments of node embeddings as guidance to update the graph neural networks. Peng et al [11] proposed to use the global context of nodes as the supervisory signals to learn node embeddings. 7.2 Graph Neural Networks GNNs can be roughly categorized into spectral methods and spatial methods. Spectral methods were initially developed based on spectral theory [7, 27, 28]. Bruna et al [27] first extended the notion of convolution to non-grid structures. Afterward, a simplified version of spectral GNNs called ChebNet [28] was developed. Next, GCN is proposed by Kipf et al [7], where Chebnet is further simplified based on its first-order approximation. Later, Wu et al [29] proposed Simple Graph Convolution (SGC) to simplify GCN by removing nonlinearities and collapsing weight matrices. Spatial methods consider the topological structure of the graph, and aggregate the information of nodes according to local information [8,9]. Hamilton et al [8] proposed an inductive learning method called GraphSAGE for large-scale networks. Velickovicet al. [9] proposed graph attention network (GAT), which includes an attention mechanism to graph convolutions. Further, Rong et al [30] developed deep graph convolution network by applying DropEdge mechanism to randomly drop edges during training. For a thorough review, please refer to recent surveys [31, 32].
Reference
  • Alexander Kolesnikov, Xiaohua Zhai, and Lucas Beyer. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1920–1929, 2019.
    Google ScholarLocate open access versionFindings
  • Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 132–149, 2018.
    Google ScholarLocate open access versionFindings
  • Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, 2014.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    Findings
  • Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
    Findings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
    Findings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024–1034, 2017.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. 2018.
    Google ScholarFindings
  • Ke Sun, Zhouchen Lin, and Zhanxing Zhu. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labels. arXiv preprint arXiv:1902.11038, 2019.
    Findings
  • Zhen Peng, Yixiang Dong, Minnan Luo, Xiao-Ming Wu, and Qinghua Zheng. Self-supervised graph representation learning via global context prediction. arXiv preprint arXiv:2003.01604, 2020.
    Findings
  • David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
    Google ScholarLocate open access versionFindings
  • George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing, 20(1):359–392, 1998.
    Google ScholarLocate open access versionFindings
  • Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE international conference on computer vision, pages 1476–1485, 2019.
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    Google ScholarLocate open access versionFindings
  • Mark Newman. Networks. Oxford university press, 2018.
    Google ScholarFindings
  • Stephen P Borgatti and Martin G Everett. Notions of position in social network analysis. Sociological methodology, pages 1–35, 1992.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), 2003.
    Google ScholarLocate open access versionFindings
  • Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 2008.
    Google ScholarLocate open access versionFindings
  • Jiangfan Han, Ping Luo, and Xiaogang Wang. Deep self-learning from noisy labels. In Proceedings of the IEEE International Conference on Computer Vision, pages 5138–5147, 2019.
    Google ScholarLocate open access versionFindings
  • Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision, pages 69–84.
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
    Findings
  • Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
    Google ScholarLocate open access versionFindings
  • Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In European conference on computer vision, pages 649–666.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2701–2710, 2017.
    Google ScholarLocate open access versionFindings
  • Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
    Findings
  • Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844– 3852, 2016.
    Google ScholarLocate open access versionFindings
  • Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty, Tao Yu, and Kilian Q Weinberger. Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153, 2019.
    Findings
  • Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. The truly deep graph convolutional networks for node classification. arXiv preprint arXiv:1907.10903, 2019.
    Findings
  • Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.
    Google ScholarLocate open access versionFindings
  • Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.
    Findings
Your rating :
0

 

Tags
Comments