GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 1150-1160, 2020.

Cited by: 2|Bibtex|Views283|Links
EI
Keywords:
structural representationDeep Graph Kernelgraph representation learninggraph datasetpre trainingMore(10+)
Weibo:
We study graph representation learning with the goal of characterizing and transferring structural features in social and information networks

Abstract:

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific p...More

Code:

Data:

0
Introduction
  • Representative graph structural patterns are universal and transferable across networks.
  • Barabasi and Albert show that several types of networks, e.g., World Wide Web, social, and biological networks, have the scale-free property, i.e., all of their degree distributions follow a power law [1].
  • Other common patterns across networks include small world [58], motif distribution [31], community organization [34], and core-periphery structure [6], validating the hypothesis at the conceptual level
Highlights
  • Recall that we focus on structural representation pre-training while most graph neural networks models require vertex features/attributes as input
  • We want to emphasize that Deep Graph Kernel, graph2vec and InfoGraph all need to be pre-trained on target domain graphs, but Graph Contrastive Coding only relies on the graphs listed in Table 1 for pre-training
  • We show that a graph neural network encoder pre-trained on several popular graph datasets can be directly adapted to new graph datasets and unseen graph learning tasks
  • We study graph representation learning with the goal of characterizing and transferring structural features in social and information networks
  • We present Graph Contrastive Coding (GCC), which is a graph-based contrastive learning framework to learn structural representations and similarity from data
Methods
  • The authors evaluate GCC on three graph learning tasks — node classification, graph classification, and similarity search, which have been commonly used to benchmark graph learning algorithms [12, 43, 46, 60, 61].
  • The authors first introduce the self-supervised pre-training settings in Section 4.1, and report GCC transfer learning results on three graph learning tasks in Section 4.2.
  • The authors' self-supervised pre-training is performed on six graph datasets, which can be categorized into two groups — academic graphs and social graphs.
  • As for academic graphs, the authors collect the Academia dataset from NetRep [44] as well as two DBLP datasets from SNAP [62] and NetRep [44], respectively.
Results
  • The authors compare GCC with ProNE [65], GraphWave [12], and Struc2vec [43]. Table 2 represents the results.
  • Compared with models trained from scratch, the reused model achieves competitive and sometimes better performance
  • This demonstrates the transferability of graph structural patterns and the effectiveness of the GCC framework in capturing these patterns.
  • It is still not clear if GCC’s good performance is due to pre-training or the expression power of its GIN [60] encoder
  • To answer this question, the authors fully fine-tune GCC with its GIN encoder randomly initialized, which is equivalent to train a GIN encoder from scratch.
Conclusion
  • Discussion on graph sampling

    In random walk with restart sampling, the restart probability controls the radius of ego-network (i.e., r ) which GCC conducts data augmentation on.
  • Its generalized positional embedding is defined to be the top eigenvectors of its normalized graph Laplacian.
  • Suppose one subgraph has adjacency matrix A and degree matrix D, the authors conduct eigen-decomposition on its normalized graph Laplacian s.t. I −D−1/2AD−1/2 = U ΛU ⊤, where the top eigenvectors in U [55] are defined as generalized positional embedding.In this work, the authors study graph representation learning with the goal of characterizing and transferring structural features in social and information networks.
  • The authors would like to explore applications of GCC on graphs in other domains, such as protein-protein association networks [47]
Summary
  • Introduction:

    Representative graph structural patterns are universal and transferable across networks.
  • Barabasi and Albert show that several types of networks, e.g., World Wide Web, social, and biological networks, have the scale-free property, i.e., all of their degree distributions follow a power law [1].
  • Other common patterns across networks include small world [58], motif distribution [31], community organization [34], and core-periphery structure [6], validating the hypothesis at the conceptual level
  • Methods:

    The authors evaluate GCC on three graph learning tasks — node classification, graph classification, and similarity search, which have been commonly used to benchmark graph learning algorithms [12, 43, 46, 60, 61].
  • The authors first introduce the self-supervised pre-training settings in Section 4.1, and report GCC transfer learning results on three graph learning tasks in Section 4.2.
  • The authors' self-supervised pre-training is performed on six graph datasets, which can be categorized into two groups — academic graphs and social graphs.
  • As for academic graphs, the authors collect the Academia dataset from NetRep [44] as well as two DBLP datasets from SNAP [62] and NetRep [44], respectively.
  • Results:

    The authors compare GCC with ProNE [65], GraphWave [12], and Struc2vec [43]. Table 2 represents the results.
  • Compared with models trained from scratch, the reused model achieves competitive and sometimes better performance
  • This demonstrates the transferability of graph structural patterns and the effectiveness of the GCC framework in capturing these patterns.
  • It is still not clear if GCC’s good performance is due to pre-training or the expression power of its GIN [60] encoder
  • To answer this question, the authors fully fine-tune GCC with its GIN encoder randomly initialized, which is equivalent to train a GIN encoder from scratch.
  • Conclusion:

    Discussion on graph sampling

    In random walk with restart sampling, the restart probability controls the radius of ego-network (i.e., r ) which GCC conducts data augmentation on.
  • Its generalized positional embedding is defined to be the top eigenvectors of its normalized graph Laplacian.
  • Suppose one subgraph has adjacency matrix A and degree matrix D, the authors conduct eigen-decomposition on its normalized graph Laplacian s.t. I −D−1/2AD−1/2 = U ΛU ⊤, where the top eigenvectors in U [55] are defined as generalized positional embedding.In this work, the authors study graph representation learning with the goal of characterizing and transferring structural features in social and information networks.
  • The authors would like to explore applications of GCC on graphs in other domains, such as protein-protein association networks [47]
Tables
  • Table1: Datasets for pre-training, sorted by number of vertices
  • Table2: Node classification
  • Table3: Graph classification
  • Table4: Top-k similarity search (k = 20, 40)
  • Table5: Momentum ablation
  • Table6: Pre-training hyper-parameters for E2E and MoCo
  • Table7: Performance of GIN model under various hyperparameter configurations
Download tables as Excel
Related work
  • In this section, we review related work of vertex similarity, contrastive learning and graph pre-training.

    2.1 Vertex Similarity

    Quantifying similarity of vertices in networks/graphs has been extensively studied in the past years. The goal of vertex similarity is to answer questions [26] like “How similar are these two vertices?” or “Which other vertices are most similar to these vertices?” The definition of similarity can be different in different situations. We briefly review the following three types of vertex similarity.

    Neighborhood similarity The basic assumption of neighborhood similarity, a.k.a., proximity, is that vertices closely connected should be considered similar. Early neighborhood similarity measures include Jaccard similarity (counting common neighbors), RWR similarity [36] and SimRank [21], etc. Most recently developed network embedding algorithms, such as LINE [48], DeepWalk [39], node2vec [14], also follow the neighborhood similarity assumption.
Funding
  • The work is supported by the National Key R&D Program of China (2018YFB1402600), NSFC for Distinguished Young Scholar (61825602), and NSFC (61836013)
Reference
  • Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics 74, 1 (2002), 47.
    Google ScholarLocate open access versionFindings
  • J Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems. 41–50.
    Google ScholarLocate open access versionFindings
  • Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In KDD ’06. 44–54.
    Google ScholarLocate open access versionFindings
  • Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).
    Findings
  • Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163–166.
    Google ScholarLocate open access versionFindings
  • Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures. Social networks 21, 4 (2000), 375–395.
    Google ScholarLocate open access versionFindings
  • Ronald S Burt. 2009. Structural holes: The social structure of competition. Harvard university press.
    Google ScholarFindings
  • Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2, 3 (2011), 1–27.
    Google ScholarLocate open access versionFindings
  • Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 201ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR ’19.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT ’19. 4171–4186.
    Google ScholarFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD ’17. 135– 144.
    Google ScholarLocate open access versionFindings
  • Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning structural node embeddings via diffusion wavelets. In KDD ’18. 1320–1329.
    Google ScholarFindings
  • Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In ICML ’17. JMLR. org, 1263–1272.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. 855–864.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR ’06, Vol. 2. IEEE, 1735–1742.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024–1034.
    Google ScholarFindings
  • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR ’20. 9729–9738.
    Google ScholarFindings
  • Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. Rolx: structural role extraction & mining in large graphs. In KDD ’12. 1231–1239.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 20Pre-training graph neural networks. In ICLR ’19.
    Google ScholarFindings
  • Ziniu Hu, Changjun Fan, Ting Chen, Kai-Wei Chang, and Yizhou Sun. 2019. Unsupervised Pre-Training of Graph Convolutional Networks. ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds (2019).
    Google ScholarLocate open access versionFindings
  • Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD ’02. 538–543.
    Google ScholarLocate open access versionFindings
  • Yilun Jin, Guojie Song, and Chuan Shi. 2019. GraLSP: Graph Neural Networks with Local Structural Patterns. arXiv preprint arXiv:1911.07675 (2019).
    Findings
  • Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. 2016. Benchmark Data Sets for Graph Kernels. http://graphkernels.cs.tu-dortmund.de
    Findings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR ’15.
    Google ScholarFindings
  • Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17.
    Google ScholarLocate open access versionFindings
  • Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. 2006. Vertex similarity in networks. Physical Review E 73, 2 (2006), 026120.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In KDD ’06. 631–636.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ’05. 177–187.
    Google ScholarLocate open access versionFindings
  • Silvio Micali and Zeyuan Allen Zhu. 2016. Reconstructing markov processes from independent and anonymous experiments. Discrete Applied Mathematics 200 (2016), 108–122.
    Google ScholarFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
    Google ScholarFindings
  • Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. 2004. Superfamilies of evolved and designed networks. Science 303, 5663 (2004), 1538–1542.
    Google ScholarLocate open access versionFindings
  • Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824–827.
    Google ScholarLocate open access versionFindings
  • Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).
    Findings
  • Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences 103, 23 (2006), 8577–8582.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
    Findings
  • Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD ’04. 653–658.
    Google ScholarFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024–8035.
    Google ScholarLocate open access versionFindings
  • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. 701–710.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. Netsmf: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference. 1509–1520.
    Google ScholarFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18. 459–467.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. Deepinf: Social influence prediction with deep learning. In KDD ’18. 2110–2119.
    Google ScholarFindings
  • Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD ’17. 385–394.
    Google ScholarLocate open access versionFindings
  • Scott C Ritchie, Stephen Watts, Liam G Fearnley, Kathryn E Holt, Gad Abraham, and Michael Inouye. 2016. A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell systems 3, 1 (2016), 71–82.
    Google ScholarLocate open access versionFindings
  • Daniel A Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing 42, 1 (2013), 1–26.
    Google ScholarLocate open access versionFindings
  • Fan-Yun Sun, Jordan Hoffman, Vikas Verma, and Jian Tang. 2019. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In ICLR ’19.
    Google ScholarFindings
  • Damian Szklarczyk, John H Morris, Helen Cook, Michael Kuhn, Stefan Wyder, Milan Simonovic, Alberto Santos, Nadezhda T Doncheva, Alexander Roth, Peer Bork, et al. 2016. The STRING database in 2017: quality-controlled protein– protein association networks, made broadly accessible. Nucleic acids research (2016), gkw937.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077.
    Google ScholarLocate open access versionFindings
  • Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274.
    Google ScholarLocate open access versionFindings
  • Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019).
    Findings
  • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM ’06. IEEE, 613–622.
    Google ScholarLocate open access versionFindings
  • Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg. 2012. Structural diversity in social contagion. Proceedings of the National Academy of Sciences 109, 16 (2012), 5962–5966.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
    Google ScholarLocate open access versionFindings
  • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR ’18 (2018).
    Google ScholarLocate open access versionFindings
  • Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR ’19.
    Google ScholarFindings
  • Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et al. 2019. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).
    Findings
  • Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. nature 393, 6684 (1998), 440.
    Google ScholarLocate open access versionFindings
  • Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR ’18. 3733– 3742.
    Google ScholarFindings
  • Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR ’19.
    Google ScholarLocate open access versionFindings
  • Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD ’15. 1365–1374.
    Google ScholarLocate open access versionFindings
  • Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.
    Google ScholarLocate open access versionFindings
  • Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD ’18. 974–983.
    Google ScholarLocate open access versionFindings
  • Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and et al. 2019. OAG: Toward Linking Large-Scale Heterogeneous Entity Graphs. In KDD ’19. 2585–2595.
    Google ScholarFindings
  • Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. 2019. ProNE: fast and scalable network representation learning. In IJCAI ’19. 4278–4284.
    Google ScholarFindings
  • Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In KDD ’15. 1445–1454.
    Google ScholarLocate open access versionFindings
  • Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In AAAI ’18.
    Google ScholarFindings
  • [12] We download the authors’ official source code and keep all the training settings as the same. The implementation requires a networkx graph and time points as input. We convert our dataset to the networkx format, and use automatic selection of the range of scales provided by the authors. We set the output embedding dimension to 64.
    Google ScholarFindings
  • [43] We download the authors’ official source code and use default hyper-parameters provided by the authors: (1) walk length = 80; (2) number of walks = 10; (3) window size = 10; (4) number of iterations = 5.
    Google ScholarFindings
  • [12] Embeddings computed by the GraphWave method also have the ability to generalize across graphs. The authors evaluated on synthetic graphs in their paper which are not publicly available. To compare with GraphWave on the co-author datasets, we compute GraphWave embeddings given two graphs G1 and G2 and follow the same procedure mentioned in section 4.2.2 to compute the HITS@10 (top-10 accuracy) score.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments