# Self-supervised Learning: Generative or Contrastive

Keywords:

Context Prediction ModelSentence Order PredictionNext Sentence PredictionNoise Contrastive EstimationGraph Contrastive CodingMore(37+)

Weibo:

Abstract:

Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the...More

Code:

Data:

Introduction

- Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning.
- There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68].
- Hu et al [62] proposes GPT-GNN, a generative pre-training method for graph neural network.

Highlights

- Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning
- The supervised learning is trained over a specific task with a large manually labeled dataset which is randomly divided into training, validatiton and test sets
- There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68]
- In Section 2, we introduce the preliminary knowledge for computer vision, natural language processing, and graph learning
- A similar work to aggregate similar vectors together in embedding space is vector quantization (VQ)-Variational Autoencoders (VAE) [118], [143] that we introduce in Section 3
- The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework

Results

- Variational auto-encoding models have been employed in node representation learning on graphs.
- Deep InfoMax [59] is the first one to explicitly model mutual information through a contrastive learning task, which maximize the MI between a local patch and its global context.
- It randomly samples two different views of an image to generate the local feature vector and context vector, Fig. 8: Deep Graph InfoMax [147] uses a readout function to generate summary vector s1, and puts it into a discriminator with node 1’s embedding x1 and corrupted embedding x1 respectively to identify which embedding is the real embedding.
- As what CMC has done to improve Deep InfoMax, in [55] authors propose a contrastive multi-view representation learning method for graph.
- Researchers borrow ideas from semi-supervised learning to produce pseudo labels via cluster-based discrimination, and achieve rather good performance on representations.
- Clustering-based discrimination may help in the generalization of other pre-trained models, transferring models from pretext objectives to real tasks better.
- M3S [131] adopts the similar idea to perform DeepCluster-based self-supervised pre-training for better semi-supervised prediction.
- A more radical step is made by BYOL [48], which discards negative sampling in self-supervised learning but achieve an even better result over InfoMin. For contrastive learning methods the authors mention above, they learn representations by predicting different views of the same image and cast the prediction problem directly in representation space.
- No matter how self-supervised learning models improve, they are still only powerful feature extractor, and to transfer to downstream task the authors still need abundant labels.
- In Section 4.2.1, the authors have introduced M3S [?] that attempts to combine cluster-based contrastive pre-training and downstream semi-supervised learning.

Conclusion

- They propose a 3-step framework: 1) Do self-supervised pre-training as SimCLR v1, with some minor architecture modification and a deeper ResNet. 2) Fine-tune the last few layers with only 1% or 10% of original ImageNet labels.
- A reason for the generative model’s success in self-supervised learning is its ability to fit the data distribution, based on which varied downstream tasks can be conducted.
- The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework.

Summary

- Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning.
- There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68].
- Hu et al [62] proposes GPT-GNN, a generative pre-training method for graph neural network.
- Variational auto-encoding models have been employed in node representation learning on graphs.
- Deep InfoMax [59] is the first one to explicitly model mutual information through a contrastive learning task, which maximize the MI between a local patch and its global context.
- It randomly samples two different views of an image to generate the local feature vector and context vector, Fig. 8: Deep Graph InfoMax [147] uses a readout function to generate summary vector s1, and puts it into a discriminator with node 1’s embedding x1 and corrupted embedding x1 respectively to identify which embedding is the real embedding.
- As what CMC has done to improve Deep InfoMax, in [55] authors propose a contrastive multi-view representation learning method for graph.
- Researchers borrow ideas from semi-supervised learning to produce pseudo labels via cluster-based discrimination, and achieve rather good performance on representations.
- Clustering-based discrimination may help in the generalization of other pre-trained models, transferring models from pretext objectives to real tasks better.
- M3S [131] adopts the similar idea to perform DeepCluster-based self-supervised pre-training for better semi-supervised prediction.
- A more radical step is made by BYOL [48], which discards negative sampling in self-supervised learning but achieve an even better result over InfoMin. For contrastive learning methods the authors mention above, they learn representations by predicting different views of the same image and cast the prediction problem directly in representation space.
- No matter how self-supervised learning models improve, they are still only powerful feature extractor, and to transfer to downstream task the authors still need abundant labels.
- In Section 4.2.1, the authors have introduced M3S [?] that attempts to combine cluster-based contrastive pre-training and downstream semi-supervised learning.
- They propose a 3-step framework: 1) Do self-supervised pre-training as SimCLR v1, with some minor architecture modification and a deeper ResNet. 2) Fine-tune the last few layers with only 1% or 10% of original ImageNet labels.
- A reason for the generative model’s success in self-supervised learning is its ability to fit the data distribution, based on which varied downstream tasks can be conducted.
- The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework.

- Table1: An overview of recent self-supervised representation learning. For acronyms used, “FOS” refers to fields of study; “NS” refers to negative samples; “PS” refers to positive samples; “MI” refers to mutual information. For alphabets in “Type”: G Generative ; C Contrastive; G-C Generative-Contrastive (Adversarial)

Funding

- The work is supported by the National Key R&D Program of China (2018YFB1402600), NSFC for Distinguished Young Scholar (61825602), and NSFC (61836013)

Reference

- H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, and M. Marchand. Domain-adversarial neural networks. arXiv preprint arXiv:1412.4446, 2014.
- F. Alam, S. Joty, and M. Imran. Domain adaptation with adversarial training and graph embeddings. arXiv preprint arXiv:1805.05151, 2018.
- A. A. Alemi, B. Poole, I. Fischer, J. V. Dillon, R. A. Saurous, and K. Murphy. Fixing a broken elbo. arXiv preprint arXiv:1711.00464, 2017.
- S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
- A. Asai, K. Hashimoto, H. Hajishirzi, R. Socher, and C. Xiong. Learning to retrieve reasoning paths over wikipedia graph for question answering. arXiv preprint arXiv:1911.10470, 2019.
- P. Bachman, R. D. Hjelm, and W. Buchwalter. Learning representations by maximizing mutual information across views. In NIPS, pages 15509–15519, 2019.
- Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, and W. Wang. Simgnn: A neural network approach to fast graph similarity computation. In WSDM, pages 384–392, 2019.
- D. H. Ballard. Modular learning in neural networks. In AAAI, pages 279–284, 1987.
- D. Bau, J.-Y. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum, W. T. Freeman, and A. Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597, 2018.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155, 2003.
- Y. Bengio, N. Leonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Y. Bengio, P. Y. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5 2:157–66, 1994.
- M. Besserve, R. Sun, and B. Scholkopf. Counterfactuals uncover the modular structure of deep generative models. arXiv preprint arXiv:1812.03253, 2018.
- Y. Blau and T. Michaeli. Rethinking lossy compression: The ratedistortion-perception tradeoff. arXiv preprint arXiv:1901.07821, 2019.
- P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
- A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- L. Cai and W. Y. Wang. Kbgan: Adversarial learning for knowledge graph embeddings. arXiv preprint arXiv:1711.04071, 2017.
- S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM ’15, 2015.
- S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.
- M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the ECCV (ECCV), pages 132–149, 2018.
- H. Chen, B. Perozzi, Y. Hu, and S. Skiena. Harp: Hierarchical representation learning for networks. In AAAI, 2018.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
- T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029, 2020.
- T. Chen, Y. Sun, Y. Shi, and L. Hong. On sampling strategies for neural network-based collaborative filtering. In SIGKDD, pages 767–776, 2017.
- X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pages 2172–2180, 2016.
- X. Chen, H. Fan, R. Girshick, and K. He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
- L. Chongxuan, T. Xu, J. Zhu, and B. Zhang. Triple generative adversarial nets. In NIPS, pages 4088–4098, 2017.
- K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning. Electra: Pretraining text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020.
- A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jegou. Word translation without parallel data. arXiv preprint arXiv:1710.04087, 2017.
- Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.
- Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. Le, and R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixedlength context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988, 2019.
- V. R. de Sa. Learning classification with unlabeled data. In NIPS, pages 112–119, 1994.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
- M. Ding, J. Tang, and J. Zhang. Semi-supervised learning on graphs with generative adversarial nets. In Proceedings of the 27th ACM CIKM, pages 913–922, 2018.
- M. Ding, C. Zhou, Q. Chen, H. Yang, and J. Tang. Cognitive graph for multi-hop reading comprehension at scale. arXiv preprint arXiv:1905.05460, 2019.
- L. Dinh, D. Krueger, and Y. Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE ICCV, pages 1422–1430, 2015.
- J. Donahue, P. Krahenbuhl, and T. Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
- J. Donahue and K. Simonyan. Large scale adversarial representation learning. In NIPS, pages 10541–10551, 2019.
- C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In SIGKDD, pages 1320–1329, 2018.
- V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
- S. Gidaris, P. Singh, and N. Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580–587, 2014.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
- J.-B. Grill, F. Strub, F. Altche, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733, 2020.
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855–864, 2016.
- A. Grover, A. Zweig, and S. Ermon. Graphite: Iterative generative modeling of graphs. In ICML, 2018.
- M. Gutmann and A. Hyvarinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 297–304, 2010.
- K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909, 2020.
- W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull., 40:52– 74, 2017.
- W. L. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NIPS, 2017.
- K. Hassani and A. H. Khasahmadi. Contrastive multi-view representation learning on graphs. arXiv preprint arXiv:2006.05582, 2020.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsupervised visual representation learning. arXiv preprint arXiv:1911.05722, 2019.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song. Using self-supervised learning can improve model robustness and uncertainty. In NeurIPS, pages 15663–15674, 2019.
- R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
- J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In ICML, pages 2722–2730, 2019.
- W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec. Strategies for pre-training graph neural networks. In ICLR, 2019.
- Z. Hu, Y. Dong, K. Wang, K.-W. Chang, and Y. Sun. Gpt-gnn: Generative pre-training of graph neural networks. arXiv preprint arXiv:2006.15437, 2020.
- Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. arXiv preprint arXiv:2003.01332, 2020.
- G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE CVPR, pages 2261–2269, 2017.
- S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv, abs/1502.03167, 2015.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In CVPR, pages 1125–1134, 2017.
- L. Jing and Y. Tian. Self-supervised visual feature learning with deep neural networks: A survey. arXiv preprint arXiv:1902.06162, 2019.
- M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020.
- T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
- D. Kim, D. Cho, D. Yoo, and I. S. Kweon. Learning image representations by completing damaged jigsaw puzzles. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 793–802. IEEE, 2018.
- D. P. Kingma and P. Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In NIPS, pages 10215–10224, 2018.
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- T. N. Kipf and M. Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
- L. Kong, C. d. M. d’Autume, W. Ling, L. Yu, Z. Dai, and D. Yogatama. A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350, 2019.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
- Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
- G. Larsson, M. Maire, and G. Shakhnarovich. Learning representations for automatic colorization. In ECCV, pages 577–593.
- G. Larsson, M. Maire, and G. Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, pages 6874–6883, 2017.
- Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1(4):541551, Dec. 1989.
- Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. Efficient backprop. In Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, page 950, Berlin, Heidelberg, 1998. SpringerVerlag.
- C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photorealistic single image super-resolution using a generative adversarial network. In CVPR, pages 4681–4690, 2017.
- D. Li, W.-C. Hung, J.-B. Huang, S. Wang, N. Ahuja, and M.-H. Yang. Unsupervised visual representation learning by graphbased consistent constraints. In ECCV, pages 678–694.
- R. Li, S. Wang, F. Zhu, and J. Huang. Adaptive graph convolutional neural networks. ArXiv, abs/1801.03226, 2018.
- B. Liu. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167, 2012.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
- A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
- M. Mathieu. Masked autoencoder for distribution estimation. 2015.
- T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS’13, pages 3111–3119, 2013.
- I. Misra and L. van der Maaten. Self-supervised learning of pretext-invariant representations. arXiv preprint arXiv:1912.01991, 2019.
- V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
- A. Newell and J. Deng. How useful is self-supervised pretraining for visual tasks? In CVPR, pages 7345–7354, 2020.
- A. Ng et al. Sparse autoencoder. CS294A Lecture notes, 72(2011):1– 19, 2011.
- M. Noroozi and P. Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, pages 69–84.
- M. Noroozi, A. Vinjimoor, P. Favaro, and H. Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, pages 9359–9367, 2018.
- S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In NIPS, pages 271–279, 2016.
- A. v. d. Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD ’16, 2016.
- D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In CVPR, pages 2536–2544, 2016.
- Z. Peng, Y. Dong, M. Luo, X. ming Wu, and Q. Zheng. Selfsupervised graph representation learning via global context prediction. ArXiv, abs/2003.01604, 2020.
- Z. Peng, Y. Dong, M. Luo, X.-M. Wu, and Q. Zheng. Selfsupervised graph representation learning via global context prediction. arXiv preprint arXiv:2003.01604, 2020.
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710, 2014.
- M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.
- M. Popova, M. Shvets, J. Oliva, and O. Isayev. Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv preprint arXiv:1905.13372, 2019.
- J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang, and J. Tang. Gcc: Graph contrastive coding for graph neural network pre-training. arXiv preprint arXiv:2006.09963, 2020.
- J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18, 2018.
- J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459–467, 2018.
- J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang. Deepinf: Social influence prediction with deep learning. In KDD’18, pages 2110–2119. ACM, 2018.
- X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang. Pre-trained models for natural language processing: A survey. arXiv preprint arXiv:2003.08271, 2020.
- A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving language understanding by generative pre-training.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
- P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
- A. Razavi, A. van den Oord, and O. Vinyals. Generating diverse high-fidelity images with vq-vae-2. In NIPS, pages 14837–14847, 2019.
- L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In SIGKDD, pages 385–394, 2017.
- M. T. Ribeiro, S. Singh, and C. Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In SIGKDD, pages 1135–1144, 2016.
- N. Sarafianos, X. Xu, and I. A. Kakadiaris. Adversarial representation learning for text-to-image matching. In Proceedings of the IEEE ICCV, pages 5814–5824, 2019.
- A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR, abs/1312.6120, 2013.
- J. Shen, Y. Qu, W. Zhang, and Y. Yu. Adversarial representation learning for domain adaptation. stat, 1050:5, 2017.
- T. Shen, T. Lei, R. Barzilay, and T. Jaakkola. Style transfer from non-parallel text by cross-alignment. In NIPS, pages 6830–6841, 2017.
- C. Shi, M. Xu, Z. Zhu, W. Zhang, M. Zhang, and J. Tang. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-j. P. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In WWW’15, pages 243–246, 2015.
- P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Technical report, Colorado Univ at Boulder Dept of Computer Science, 1986.
- F.-Y. Sun, J. Hoffmann, and J. Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000, 2019.
- F.-Y. Sun, M. Qu, J. Hoffmann, C.-W. Huang, and J. Tang. vgraph: A generative model for joint community detection and node representation learning. In NIPS, pages 512–522, 2019.
- K. Sun, Z. Zhu, and Z. Lin. Multi-stage self-supervised learning for graph convolutional networks. arXiv preprint arXiv:1902.11038, 2019.
- Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, and H. Wu. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223, 2019.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. 2015 IEEE CVPR, pages 1–9, 2015.
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW’15, pages 1067–1077, 2015.
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990–998, 2008.
- W. L. Taylor. cloze procedure: A new tool for measuring readability. Journalism quarterly, 30(4):415–433, 1953.
- Y. Tian, D. Krishnan, and P. Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
- Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola. What makes for good views for contrastive learning. arXiv preprint arXiv:2005.10243, 2020.
- M. Tschannen, O. Bachem, and M. Lucic. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069, 2018.
- M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, and M. Lucic. On mutual information maximization for representation learning. arXiv preprint arXiv:1907.13625, 2019.
- [142] A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. In NIPS, pages 4790–4798, 2016.
- [143] A. van den Oord, O. Vinyals, et al. Neural discrete representation learning. In NIPS, pages 6306–6315, 2017.
- [144] A. Van Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. In ICML, pages 1747–1756, 2016.
- [145] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, pages 5998–6008, 2017.
- [146] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- [147] P. Velickovic, W. Fedus, W. L. Hamilton, P. Lio, Y. Bengio, and R. D. Hjelm. Deep graph infomax. arXiv preprint arXiv:1809.10341, 2018.
- [148] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.
- [149] P. Wang, S. Li, and R. Pan. Incorporating gan for negative sampling in knowledge representation learning. In AAAI, 2018.
- [150] T. Wang and P. Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. arXiv preprint arXiv:2005.10242, 2020.
- [151] Z. Wang, Q. She, and T. E. Ward. Generative adversarial networks: A survey and taxonomy. arXiv preprint arXiv:1906.01529, 2019.
- [152] C. Wei, L. Xie, X. Ren, Y. Xia, C. Su, J. Liu, Q. Tian, and A. L. Yuille. Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning. In CVPR, pages 1910–1919, 2019.
- [153] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pages 3733–3742, 2018.
- [154] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le. Self-training with noisy student improves imagenet classification. In CVPR, pages 10687–10698, 2020.
- [155] S. Xie, R. B. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. 2017 IEEE CVPR, pages 5987–5995, 2017.
- [156] W. Xiong, J. Du, W. Y. Wang, and V. Stoyanov. Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. arXiv preprint arXiv:1912.09637, 2019.
- [157] X. Yan, I. Misra, A. Gupta, D. Ghadiyaram, and D. Mahajan. Clusterfit: Improving generalization of visual representations. arXiv preprint arXiv:1912.03330, 2019.
- [158] J. Yang, D. Parikh, and D. Batra. Joint unsupervised learning of deep representations and image clusters. In CVPR, pages 5147–5156, 2016.
- [159] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. In NIPS, pages 5754–5764, 2019.
- [160] Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
- [161] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In NIPS, pages 6410–6421, 2018.
- [162] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. In ICML, pages 5708–5717, 2018.
- [163] Y. You, T. Chen, Z. Wang, and Y. Shen. When does selfsupervision help graph convolutional networks? arXiv preprint arXiv:2006.09136, 2020.
- [164] S. Zagoruyko and N. Komodakis. Wide residual networks. ArXiv, abs/1605.07146, 2016.
- [165] F. Zhang, X. Liu, J. Tang, Y. Dong, P. Yao, J. Zhang, X. Gu, Y. Wang, B. Shao, R. Li, and K. Wang. Oag: Toward linking large-scale heterogeneous entity graphs. In KDD’19, pages 2585–2595, 2019.
- [166] J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: fast and scalable network representation learning. In IJCAI, pages 4278–4284, 2019.
- [167] M. Zhang, Z. Cui, M. Neumann, and Y. Chen. An end-to-end deep learning architecture for graph classification. In AAAI, 2018.
- [168] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In ECCV, pages 649–666.
- [169] R. Zhang, P. Isola, and A. A. Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, pages 1058–1067, 2017.
- [170] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
- [171] D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In SIGKDD, pages 2827–2836, 2018.
- [172] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward multimodal image-to-image translation. In NIPS, pages 465–476, 2017.
- [173] C. Zhuang, A. L. Zhai, and D. Yamins. Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE ICCV, pages 6002–6012, 2019.
- [174] B. Zoph, G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. V. Le. Rethinking pre-training and self-training. arXiv preprint arXiv:2006.06882, 2020.
- [175] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. v2/v3 (June 2020): correct several typos and mistakes.
- v4 (July 2020): add papers newly published; add a new theoretical analysis part for contrastive objective; add semi-supervised self-training’s connection with selfsupervised contrastive learning.
- Li Mian received bachelor degree(2020) from Department of Computer Science, Beijing Institute of Technology. She is now admitted into a graduate program in Georgia Institute of Technology. Her research interests focus on data mining, natural language processing and machine learning.

Tags

Comments