GPT-GNN: Generative Pre-Training of Graph Neural Networks

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 1857-1867, 2020.

Cited by: 0|Bibtex|Views458|Links
EI
Keywords:
edge generation—GPT-GNNheterogeneous graph transformerheterogeneous information networklarge scalepre trainingMore(23+)
Weibo:
We present the GPT-GNN framework to initialize GNNs by generative pre-training

Abstract:

Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce the labeling effort is to pre-train an expressive GNN model on unlabelled data with self-superv...More

Code:

Data:

0
Introduction
  • The breakthroughs in graph neural networks (GNNs) have revolutionized graph mining from structural feature engineering to representation learning [1, 9, 17].
  • For different tasks on the same graph, it is required to have enough and different sets of labeled data to train dedicated GNNs corresponding to each task.
  • It is arduously expensive and sometimes infeasible to access sufficient labeled data for those tasks, for large-scale graphs.
  • For example, the author disambiguation task in academic graphs [34], it has still faced the challenge of the lack of ground-truth to date
Highlights
  • The breakthroughs in graph neural networks (GNNs) have revolutionized graph mining from structural feature engineering to representation learning [1, 9, 17]
  • We summarize the performance of downstream tasks with different pre-training methods on Open Academic Graph and Amazon in Table 1
  • The proposed GPT-graph neural networks framework significantly enhances the performance for all downstream tasks on both datasets
  • We pre-train and fine-tune on two homogeneous graphs: 1) the paper citation network extracted from the field of computer science in Open Academic Graph, on which the topic of each paper is predicted; 2) the Reddit network consisting of Reddit posts, on which the community of each post is inferred
  • We can observe that the downstream tasks on both homogeneous graphs can benefit from all pre-training frameworks, among which the proposed GPT-graph neural networks offers the largest performance gains
  • We propose to separate the attribute and edge generation nodes to avoid information leakage
Methods
  • The authors use the Reddit dataset [12] and the paper citation network extracted from OAG.
  • Open Academic Graph (OAG) [34, 38, 44] contains more than 178 million nodes and 2.236 billion edges.
  • It is the largest publicly available heterogeneous academic dataset to date.
  • The performance is evaluated by MRR—a widely adopted ranking metric [19]
Results
  • The authors summarize the performance of downstream tasks with different pre-training methods on OAG and Amazon in Table 1.
  • GPT-GNN achieves relative performance gains of 13.3% and 5.7% over the base model without pre-training on OAG and Amazon, respectively
  • It consistently outperforms other pre-training frameworks, such as Graph Infomax, across different downstream tasks for all three transfer settings on both datasets.
  • The authors can observe that the downstream tasks on both homogeneous graphs can benefit from all pre-training frameworks, among which the proposed GPT-GNN offers the largest performance gains
Conclusion
  • The authors study the problem of graph neural network pretraining. The authors present GPT-GNN—a generative GNN pre-training framework.
  • In GPT-GNN, the authors design the graph generation factorization to guide the base GNN model to autoregressively reconstruct both the attributes and structure of the input graph.
  • The pretrained GNNs with fine-tuning over few labeled data can achieve significant performance gains on various downstream tasks across different datasets.
  • GPT-GNN is robust to different transfer settings between pre-training and fine-tuning.
  • The authors find that fine-tuning the generative pre-trained GNN model with 10–20% of labeled data offers comparative performance for downstream tasks to the supervised GNN model with 100% of training data
Summary
  • Introduction:

    The breakthroughs in graph neural networks (GNNs) have revolutionized graph mining from structural feature engineering to representation learning [1, 9, 17].
  • For different tasks on the same graph, it is required to have enough and different sets of labeled data to train dedicated GNNs corresponding to each task.
  • It is arduously expensive and sometimes infeasible to access sufficient labeled data for those tasks, for large-scale graphs.
  • For example, the author disambiguation task in academic graphs [34], it has still faced the challenge of the lack of ground-truth to date
  • Methods:

    The authors use the Reddit dataset [12] and the paper citation network extracted from OAG.
  • Open Academic Graph (OAG) [34, 38, 44] contains more than 178 million nodes and 2.236 billion edges.
  • It is the largest publicly available heterogeneous academic dataset to date.
  • The performance is evaluated by MRR—a widely adopted ranking metric [19]
  • Results:

    The authors summarize the performance of downstream tasks with different pre-training methods on OAG and Amazon in Table 1.
  • GPT-GNN achieves relative performance gains of 13.3% and 5.7% over the base model without pre-training on OAG and Amazon, respectively
  • It consistently outperforms other pre-training frameworks, such as Graph Infomax, across different downstream tasks for all three transfer settings on both datasets.
  • The authors can observe that the downstream tasks on both homogeneous graphs can benefit from all pre-training frameworks, among which the proposed GPT-GNN offers the largest performance gains
  • Conclusion:

    The authors study the problem of graph neural network pretraining. The authors present GPT-GNN—a generative GNN pre-training framework.
  • In GPT-GNN, the authors design the graph generation factorization to guide the base GNN model to autoregressively reconstruct both the attributes and structure of the input graph.
  • The pretrained GNNs with fine-tuning over few labeled data can achieve significant performance gains on various downstream tasks across different datasets.
  • GPT-GNN is robust to different transfer settings between pre-training and fine-tuning.
  • The authors find that fine-tuning the generative pre-trained GNN model with 10–20% of labeled data offers comparative performance for downstream tasks to the supervised GNN model with 100% of training data
Tables
  • Table1: Performance of different downstream tasks on OAG and Amazon by using different pre-training frameworks with the heterogeneous graph transformer (HGT) [<a class="ref-link" id="c15" href="#r15">15</a>] as the base model. 10% of labeled data is used for fine-tuning. On OAG, the performance improvements brought by GPTGNN’s edge generation, GAE, and GraphSage over no pre-training are 10.3%, 7.4%, and 4.0%, respectively. On Amazon, the gains are
  • Table2: Compare the pre-training Gain with different GNN architectures. Evaluate on OAG, Paper-Field Task, under Combined Transfer setting with 10% training data
  • Table3: Downstream performance on homogeneous graphs, including the paper citation network in OAG and Reddit
  • Table4: Generated paper title samples. The left column is generated by GPT-GNN, and the right column is the groundtruth
Download tables as Excel
Related work
  • The goal of pre-training is to allow a model (usually neural networks) to initialize its parameters with pre-trained weights. In this way, the model can leverage the commonality between the pretraining and downstream tasks. Recently pre-training has shown superiority in boosting the performance of many downstream applications in computer vision and natural language processing. In the following, we first introduce the preliminaries about GNNs and then review pre-training approaches in graphs and other domains.

    2.1 Preliminaries of Graph Neural Networks

    Recent years have witnessed the success of GNNs for modeling graph data [12, 15, 17, 36]. A GNN can be regarded as using the input graph structure as the computation graph for message passing [9], during which the local neighborhood information is aggregated to get a more contextual representation. Formally, suppose Ht(l) is the node representation of node t at the (l)-th GNN layer, the update procedure from the (l-1)-th layer to the (l)-th layer is: Ht(l) ← Aggregate
Funding
  • This work is partially supported by NSF III-1705169, NSF CAREER Award 1741634, NSF 1937599, DARPA HR00112090027, DARPA N660011924032, Okawa Foundation Grant, and Amazon Research Award
Reference
  • Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv:1312.6203 (2013).
    Findings
  • Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arxiv:2002.05709 (2020).
    Findings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. (2009).
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019.
    Google ScholarLocate open access versionFindings
  • Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML 2014.
    Google ScholarLocate open access versionFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD 2017.
    Google ScholarLocate open access versionFindings
  • Yuxiao Dong, Ziniu Hu, Kuansan Wang, Yizhou Sun, and Jie Tang. 2020. Heterogeneous Network Representation Learning. In IJCAI.
    Google ScholarFindings
  • Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. ICLR Workshop (2019).
    Google ScholarLocate open access versionFindings
  • Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In ICML 2017.
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR 2014.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD 2016.
    Google ScholarLocate open access versionFindings
  • William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NeurIPS 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2019. Momentum contrast for unsupervised visual representation learning. arXiv:1911.05722 (2019).
    Findings
  • Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay S. Pande, and Jure Leskovec. 2020. Strategies for Pre-training Graph Neural Networks. In ICLR 2020.
    Google ScholarLocate open access versionFindings
  • Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous Graph Transformer. In WWW 2020.
    Google ScholarFindings
  • Thomas N. Kipf and Max Welling. 20Variational Graph Auto-Encoders. arXiv:1611.07308 (2016).
    Findings
  • Thomas N. Kipf and Max Welling. 20Semi-Supervised Classification with Graph Convolutional Networks. In ICLR 2017.
    Google ScholarLocate open access versionFindings
  • Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, William L. Hamilton, David Duvenaud, Raquel Urtasun, and Richard S. Zemel. 2019. Efficient Graph Generation with Graph Recurrent Attention Networks. In NeurIPS 2019.
    Google ScholarLocate open access versionFindings
  • Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer.
    Google ScholarFindings
  • Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. In ICLR 2017.
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR 2019.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS 2013.
    Google ScholarLocate open access versionFindings
  • Jianmo Ni, Jiacheng Li, and Julian J. McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In EMNLP 2019.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context Encoders: Feature Learning by Inpainting. In CVPR 2016.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP 2014.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18. 459–467.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).
    Google ScholarFindings
  • Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In ESWC 2018.
    Google ScholarLocate open access versionFindings
  • Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2020. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In ICLR 2020.
    Google ScholarFindings
  • Yizhou Sun and Jiawei Han. 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers.
    Google ScholarFindings
  • Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB 2011.
    Google ScholarLocate open access versionFindings
  • Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2012. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In KDD 2012.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW 2015.
    Google ScholarLocate open access versionFindings
  • Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD 2008.
    Google ScholarLocate open access versionFindings
  • Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748 (2018).
    Findings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR 2018.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. 2019. Deep Graph Infomax. In ICLR 2019.
    Google ScholarLocate open access versionFindings
  • Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396–413.
    Google ScholarLocate open access versionFindings
  • Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous Graph Attention Network. In WWW 2019.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, RÃľmi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Transformers: State-of-the-art Natural Language Processing. arXiv:cs.CL/1910.03771
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS 2019.
    Google ScholarLocate open access versionFindings
  • Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD 2018.
    Google ScholarLocate open access versionFindings
  • Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In ICML 2018.
    Google ScholarLocate open access versionFindings
  • Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and Kuansan Wang. 2019. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. In KDD 2019.
    Google ScholarLocate open access versionFindings
  • Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. 2019. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In NeurIPS 2019. Open Academic Graph (OAG) [34, 38, 44] consists of five types of nodes: ‘Paper’, ‘Author’, ‘Field’, ‘Venue’, and ‘Institute’, and 14 types of edges between these nodes. The schema and meta relations are illustrated in Figure 4(a). For example, the ‘Field’ nodes in the OAG are categorized into six levels from L0 to L5, which are organized with a hierarchical tree (We use ‘is_organized_in’ to represent this hierarchy). Therefore, we differentiate the ‘Paper–Field’ edges in the corresponding field levels. Besides, we differentiate the different author orders (i.e., the first author, the last one, and others) and venue types (i.e., journal, conference, and preprint) as well. Finally, the ‘Self’ type corresponds to the self-loop connection, which is widely added in GNN architectures. Despite ‘Self’ and ‘CoAuthor’ edge relationships, which are symmetric, all other edge types X have a reverse edge type X −1.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments