Self-supervised Learning: Generative or Contrastive

Cited by: 5|Bibtex|Views4168|Links
Keywords:
Context Prediction ModelSentence Order PredictionNext Sentence PredictionNoise Contrastive EstimationGraph Contrastive CodingMore(37+)
Weibo:
There exist several comprehensive reviews related to Pre-trained Language Models, Generative Adversarial Networks, Autoencoder and contrastive learning for visual representation

Abstract:

Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the...More

Code:

Data:

0
Introduction
  • Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning.
  • There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68].
  • Hu et al [62] proposes GPT-GNN, a generative pre-training method for graph neural network.
Highlights
  • Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning
  • The supervised learning is trained over a specific task with a large manually labeled dataset which is randomly divided into training, validatiton and test sets
  • There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68]
  • In Section 2, we introduce the preliminary knowledge for computer vision, natural language processing, and graph learning
  • A similar work to aggregate similar vectors together in embedding space is vector quantization (VQ)-Variational Autoencoders (VAE) [118], [143] that we introduce in Section 3
  • The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework
Results
  • Variational auto-encoding models have been employed in node representation learning on graphs.
  • Deep InfoMax [59] is the first one to explicitly model mutual information through a contrastive learning task, which maximize the MI between a local patch and its global context.
  • It randomly samples two different views of an image to generate the local feature vector and context vector, Fig. 8: Deep Graph InfoMax [147] uses a readout function to generate summary vector s1, and puts it into a discriminator with node 1’s embedding x1 and corrupted embedding x1 respectively to identify which embedding is the real embedding.
  • As what CMC has done to improve Deep InfoMax, in [55] authors propose a contrastive multi-view representation learning method for graph.
  • Researchers borrow ideas from semi-supervised learning to produce pseudo labels via cluster-based discrimination, and achieve rather good performance on representations.
  • Clustering-based discrimination may help in the generalization of other pre-trained models, transferring models from pretext objectives to real tasks better.
  • M3S [131] adopts the similar idea to perform DeepCluster-based self-supervised pre-training for better semi-supervised prediction.
  • A more radical step is made by BYOL [48], which discards negative sampling in self-supervised learning but achieve an even better result over InfoMin. For contrastive learning methods the authors mention above, they learn representations by predicting different views of the same image and cast the prediction problem directly in representation space.
  • No matter how self-supervised learning models improve, they are still only powerful feature extractor, and to transfer to downstream task the authors still need abundant labels.
  • In Section 4.2.1, the authors have introduced M3S [?] that attempts to combine cluster-based contrastive pre-training and downstream semi-supervised learning.
Conclusion
  • They propose a 3-step framework: 1) Do self-supervised pre-training as SimCLR v1, with some minor architecture modification and a deeper ResNet. 2) Fine-tune the last few layers with only 1% or 10% of original ImageNet labels.
  • A reason for the generative model’s success in self-supervised learning is its ability to fit the data distribution, based on which varied downstream tasks can be conducted.
  • The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework.
Summary
  • Deep neural networks [81] have shown outstanding performance on various machine learning tasks, especially on supervised learning in computer vision, natural language processing and graph learning.
  • There exist several comprehensive reviews related to Pre-trained Language Models [113], Generative Adversarial Networks [151], Autoencoder and contrastive learning for visual representation [68].
  • Hu et al [62] proposes GPT-GNN, a generative pre-training method for graph neural network.
  • Variational auto-encoding models have been employed in node representation learning on graphs.
  • Deep InfoMax [59] is the first one to explicitly model mutual information through a contrastive learning task, which maximize the MI between a local patch and its global context.
  • It randomly samples two different views of an image to generate the local feature vector and context vector, Fig. 8: Deep Graph InfoMax [147] uses a readout function to generate summary vector s1, and puts it into a discriminator with node 1’s embedding x1 and corrupted embedding x1 respectively to identify which embedding is the real embedding.
  • As what CMC has done to improve Deep InfoMax, in [55] authors propose a contrastive multi-view representation learning method for graph.
  • Researchers borrow ideas from semi-supervised learning to produce pseudo labels via cluster-based discrimination, and achieve rather good performance on representations.
  • Clustering-based discrimination may help in the generalization of other pre-trained models, transferring models from pretext objectives to real tasks better.
  • M3S [131] adopts the similar idea to perform DeepCluster-based self-supervised pre-training for better semi-supervised prediction.
  • A more radical step is made by BYOL [48], which discards negative sampling in self-supervised learning but achieve an even better result over InfoMin. For contrastive learning methods the authors mention above, they learn representations by predicting different views of the same image and cast the prediction problem directly in representation space.
  • No matter how self-supervised learning models improve, they are still only powerful feature extractor, and to transfer to downstream task the authors still need abundant labels.
  • In Section 4.2.1, the authors have introduced M3S [?] that attempts to combine cluster-based contrastive pre-training and downstream semi-supervised learning.
  • They propose a 3-step framework: 1) Do self-supervised pre-training as SimCLR v1, with some minor architecture modification and a deeper ResNet. 2) Fine-tune the last few layers with only 1% or 10% of original ImageNet labels.
  • A reason for the generative model’s success in self-supervised learning is its ability to fit the data distribution, based on which varied downstream tasks can be conducted.
  • The inception of adversarial representation learning should be attributed to Generative Adversarial Networks (GAN) [114], which proposes the adversarial training framework.
Tables
  • Table1: An overview of recent self-supervised representation learning. For acronyms used, “FOS” refers to fields of study; “NS” refers to negative samples; “PS” refers to positive samples; “MI” refers to mutual information. For alphabets in “Type”: G Generative ; C Contrastive; G-C Generative-Contrastive (Adversarial)
Download tables as Excel
Funding
  • The work is supported by the National Key R&D Program of China (2018YFB1402600), NSFC for Distinguished Young Scholar (61825602), and NSFC (61836013)
Reference
  • H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, and M. Marchand. Domain-adversarial neural networks. arXiv preprint arXiv:1412.4446, 2014.
    Findings
  • F. Alam, S. Joty, and M. Imran. Domain adaptation with adversarial training and graph embeddings. arXiv preprint arXiv:1805.05151, 2018.
    Findings
  • A. A. Alemi, B. Poole, I. Fischer, J. V. Dillon, R. A. Saurous, and K. Murphy. Fixing a broken elbo. arXiv preprint arXiv:1711.00464, 2017.
    Findings
  • S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
    Findings
  • A. Asai, K. Hashimoto, H. Hajishirzi, R. Socher, and C. Xiong. Learning to retrieve reasoning paths over wikipedia graph for question answering. arXiv preprint arXiv:1911.10470, 2019.
    Findings
  • P. Bachman, R. D. Hjelm, and W. Buchwalter. Learning representations by maximizing mutual information across views. In NIPS, pages 15509–15519, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, and W. Wang. Simgnn: A neural network approach to fast graph similarity computation. In WSDM, pages 384–392, 2019.
    Google ScholarLocate open access versionFindings
  • D. H. Ballard. Modular learning in neural networks. In AAAI, pages 279–284, 1987.
    Google ScholarLocate open access versionFindings
  • D. Bau, J.-Y. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum, W. T. Freeman, and A. Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597, 2018.
    Findings
  • Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155, 2003.
    Google ScholarLocate open access versionFindings
  • Y. Bengio, N. Leonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
    Findings
  • Y. Bengio, P. Y. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5 2:157–66, 1994.
    Google ScholarLocate open access versionFindings
  • M. Besserve, R. Sun, and B. Scholkopf. Counterfactuals uncover the modular structure of deep generative models. arXiv preprint arXiv:1812.03253, 2018.
    Findings
  • Y. Blau and T. Michaeli. Rethinking lossy compression: The ratedistortion-perception tradeoff. arXiv preprint arXiv:1901.07821, 2019.
    Findings
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
    Google ScholarLocate open access versionFindings
  • A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
  • L. Cai and W. Y. Wang. Kbgan: Adversarial learning for knowledge graph embeddings. arXiv preprint arXiv:1711.04071, 2017.
    Findings
  • S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM ’15, 2015.
    Google ScholarLocate open access versionFindings
  • S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.
    Google ScholarLocate open access versionFindings
  • M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the ECCV (ECCV), pages 132–149, 2018.
    Google ScholarLocate open access versionFindings
  • H. Chen, B. Perozzi, Y. Hu, and S. Skiena. Harp: Hierarchical representation learning for networks. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
    Findings
  • T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029, 2020.
    Findings
  • T. Chen, Y. Sun, Y. Shi, and L. Hong. On sampling strategies for neural network-based collaborative filtering. In SIGKDD, pages 767–776, 2017.
    Google ScholarLocate open access versionFindings
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pages 2172–2180, 2016.
    Google ScholarLocate open access versionFindings
  • X. Chen, H. Fan, R. Girshick, and K. He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
    Findings
  • L. Chongxuan, T. Xu, J. Zhu, and B. Zhang. Triple generative adversarial nets. In NIPS, pages 4088–4098, 2017.
    Google ScholarLocate open access versionFindings
  • K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning. Electra: Pretraining text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020.
    Findings
  • A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jegou. Word translation without parallel data. arXiv preprint arXiv:1710.04087, 2017.
    Findings
  • Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. Le, and R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixedlength context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988, 2019.
    Google ScholarLocate open access versionFindings
  • V. R. de Sa. Learning classification with unlabeled data. In NIPS, pages 112–119, 1994.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255.
    Google ScholarLocate open access versionFindings
  • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
    Google ScholarLocate open access versionFindings
  • M. Ding, J. Tang, and J. Zhang. Semi-supervised learning on graphs with generative adversarial nets. In Proceedings of the 27th ACM CIKM, pages 913–922, 2018.
    Google ScholarLocate open access versionFindings
  • M. Ding, C. Zhou, Q. Chen, H. Yang, and J. Tang. Cognitive graph for multi-hop reading comprehension at scale. arXiv preprint arXiv:1905.05460, 2019.
    Findings
  • L. Dinh, D. Krueger, and Y. Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
    Findings
  • L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
    Findings
  • C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE ICCV, pages 1422–1430, 2015.
    Google ScholarLocate open access versionFindings
  • J. Donahue, P. Krahenbuhl, and T. Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
    Findings
  • J. Donahue and K. Simonyan. Large scale adversarial representation learning. In NIPS, pages 10541–10551, 2019.
    Google ScholarLocate open access versionFindings
  • C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In SIGKDD, pages 1320–1329, 2018.
    Google ScholarLocate open access versionFindings
  • V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
    Findings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
    Google ScholarLocate open access versionFindings
  • S. Gidaris, P. Singh, and N. Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
    Findings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580–587, 2014.
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • J.-B. Grill, F. Strub, F. Altche, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733, 2020.
    Findings
  • A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855–864, 2016.
    Google ScholarLocate open access versionFindings
  • A. Grover, A. Zweig, and S. Ermon. Graphite: Iterative generative modeling of graphs. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • M. Gutmann and A. Hyvarinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 297–304, 2010.
    Google ScholarLocate open access versionFindings
  • K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909, 2020.
    Findings
  • W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull., 40:52– 74, 2017.
    Google ScholarLocate open access versionFindings
  • W. L. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • K. Hassani and A. H. Khasahmadi. Contrastive multi-view representation learning on graphs. arXiv preprint arXiv:2006.05582, 2020.
    Findings
  • K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsupervised visual representation learning. arXiv preprint arXiv:1911.05722, 2019.
    Findings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song. Using self-supervised learning can improve model robustness and uncertainty. In NeurIPS, pages 15663–15674, 2019.
    Google ScholarLocate open access versionFindings
  • R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
    Findings
  • J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In ICML, pages 2722–2730, 2019.
    Google ScholarLocate open access versionFindings
  • W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec. Strategies for pre-training graph neural networks. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Z. Hu, Y. Dong, K. Wang, K.-W. Chang, and Y. Sun. Gpt-gnn: Generative pre-training of graph neural networks. arXiv preprint arXiv:2006.15437, 2020.
    Findings
  • Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. arXiv preprint arXiv:2003.01332, 2020.
    Findings
  • G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE CVPR, pages 2261–2269, 2017.
    Google ScholarLocate open access versionFindings
  • S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv, abs/1502.03167, 2015.
    Findings
  • P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In CVPR, pages 1125–1134, 2017.
    Google ScholarLocate open access versionFindings
  • L. Jing and Y. Tian. Self-supervised visual feature learning with deep neural networks: A survey. arXiv preprint arXiv:1902.06162, 2019.
    Findings
  • M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020.
    Google ScholarLocate open access versionFindings
  • T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
    Google ScholarLocate open access versionFindings
  • D. Kim, D. Cho, D. Yoo, and I. S. Kweon. Learning image representations by completing damaged jigsaw puzzles. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 793–802. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and P. Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In NIPS, pages 10215–10224, 2018.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
    Findings
  • T. N. Kipf and M. Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
    Findings
  • L. Kong, C. d. M. d’Autume, W. Ling, L. Yu, Z. Dai, and D. Yogatama. A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350, 2019.
    Findings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
    Findings
  • G. Larsson, M. Maire, and G. Shakhnarovich. Learning representations for automatic colorization. In ECCV, pages 577–593.
    Google ScholarLocate open access versionFindings
  • G. Larsson, M. Maire, and G. Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, pages 6874–6883, 2017.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1(4):541551, Dec. 1989.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. Efficient backprop. In Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, page 950, Berlin, Heidelberg, 1998. SpringerVerlag.
    Google ScholarFindings
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photorealistic single image super-resolution using a generative adversarial network. In CVPR, pages 4681–4690, 2017.
    Google ScholarLocate open access versionFindings
  • D. Li, W.-C. Hung, J.-B. Huang, S. Wang, N. Ahuja, and M.-H. Yang. Unsupervised visual representation learning by graphbased consistent constraints. In ECCV, pages 678–694.
    Google ScholarLocate open access versionFindings
  • R. Li, S. Wang, F. Zhu, and J. Huang. Adaptive graph convolutional neural networks. ArXiv, abs/1801.03226, 2018.
    Findings
  • B. Liu. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
    Findings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
    Google ScholarLocate open access versionFindings
  • A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
    Findings
  • M. Mathieu. Masked autoencoder for distribution estimation. 2015.
    Google ScholarFindings
  • T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
    Findings
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS’13, pages 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • I. Misra and L. van der Maaten. Self-supervised learning of pretext-invariant representations. arXiv preprint arXiv:1912.01991, 2019.
    Findings
  • V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • A. Newell and J. Deng. How useful is self-supervised pretraining for visual tasks? In CVPR, pages 7345–7354, 2020.
    Google ScholarLocate open access versionFindings
  • A. Ng et al. Sparse autoencoder. CS294A Lecture notes, 72(2011):1– 19, 2011.
    Google ScholarLocate open access versionFindings
  • M. Noroozi and P. Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, pages 69–84.
    Google ScholarLocate open access versionFindings
  • M. Noroozi, A. Vinjimoor, P. Favaro, and H. Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, pages 9359–9367, 2018.
    Google ScholarLocate open access versionFindings
  • S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In NIPS, pages 271–279, 2016.
    Google ScholarLocate open access versionFindings
  • A. v. d. Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
    Findings
  • M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD ’16, 2016.
    Google ScholarLocate open access versionFindings
  • D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In CVPR, pages 2536–2544, 2016.
    Google ScholarLocate open access versionFindings
  • Z. Peng, Y. Dong, M. Luo, X. ming Wu, and Q. Zheng. Selfsupervised graph representation learning via global context prediction. ArXiv, abs/2003.01604, 2020.
    Findings
  • Z. Peng, Y. Dong, M. Luo, X.-M. Wu, and Q. Zheng. Selfsupervised graph representation learning via global context prediction. arXiv preprint arXiv:2003.01604, 2020.
    Findings
  • B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710, 2014.
    Google ScholarLocate open access versionFindings
  • M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.
    Findings
  • M. Popova, M. Shvets, J. Oliva, and O. Isayev. Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv preprint arXiv:1905.13372, 2019.
    Findings
  • J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang, and J. Tang. Gcc: Graph contrastive coding for graph neural network pre-training. arXiv preprint arXiv:2006.09963, 2020.
    Findings
  • J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18, 2018.
    Google ScholarLocate open access versionFindings
  • J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459–467, 2018.
    Google ScholarLocate open access versionFindings
  • J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang. Deepinf: Social influence prediction with deep learning. In KDD’18, pages 2110–2119. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang. Pre-trained models for natural language processing: A survey. arXiv preprint arXiv:2003.08271, 2020.
    Findings
  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
    Google ScholarLocate open access versionFindings
  • P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
    Findings
  • A. Razavi, A. van den Oord, and O. Vinyals. Generating diverse high-fidelity images with vq-vae-2. In NIPS, pages 14837–14847, 2019.
    Google ScholarLocate open access versionFindings
  • L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In SIGKDD, pages 385–394, 2017.
    Google ScholarLocate open access versionFindings
  • M. T. Ribeiro, S. Singh, and C. Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In SIGKDD, pages 1135–1144, 2016.
    Google ScholarLocate open access versionFindings
  • N. Sarafianos, X. Xu, and I. A. Kakadiaris. Adversarial representation learning for text-to-image matching. In Proceedings of the IEEE ICCV, pages 5814–5824, 2019.
    Google ScholarLocate open access versionFindings
  • A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR, abs/1312.6120, 2013.
    Findings
  • J. Shen, Y. Qu, W. Zhang, and Y. Yu. Adversarial representation learning for domain adaptation. stat, 1050:5, 2017.
    Google ScholarLocate open access versionFindings
  • T. Shen, T. Lei, R. Barzilay, and T. Jaakkola. Style transfer from non-parallel text by cross-alignment. In NIPS, pages 6830–6841, 2017.
    Google ScholarLocate open access versionFindings
  • C. Shi, M. Xu, Z. Zhu, W. Zhang, M. Zhang, and J. Tang. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
    Findings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-j. P. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In WWW’15, pages 243–246, 2015.
    Google ScholarLocate open access versionFindings
  • P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Technical report, Colorado Univ at Boulder Dept of Computer Science, 1986.
    Google ScholarLocate open access versionFindings
  • F.-Y. Sun, J. Hoffmann, and J. Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000, 2019.
    Findings
  • F.-Y. Sun, M. Qu, J. Hoffmann, C.-W. Huang, and J. Tang. vgraph: A generative model for joint community detection and node representation learning. In NIPS, pages 512–522, 2019.
    Google ScholarLocate open access versionFindings
  • K. Sun, Z. Zhu, and Z. Lin. Multi-stage self-supervised learning for graph convolutional networks. arXiv preprint arXiv:1902.11038, 2019.
    Findings
  • Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, and H. Wu. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223, 2019.
    Findings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. 2015 IEEE CVPR, pages 1–9, 2015.
    Google ScholarLocate open access versionFindings
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW’15, pages 1067–1077, 2015.
    Google ScholarLocate open access versionFindings
  • J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990–998, 2008.
    Google ScholarLocate open access versionFindings
  • W. L. Taylor. cloze procedure: A new tool for measuring readability. Journalism quarterly, 30(4):415–433, 1953.
    Google ScholarLocate open access versionFindings
  • Y. Tian, D. Krishnan, and P. Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
    Findings
  • Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola. What makes for good views for contrastive learning. arXiv preprint arXiv:2005.10243, 2020.
    Findings
  • M. Tschannen, O. Bachem, and M. Lucic. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069, 2018.
    Findings
  • M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, and M. Lucic. On mutual information maximization for representation learning. arXiv preprint arXiv:1907.13625, 2019.
    Findings
  • [142] A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. In NIPS, pages 4790–4798, 2016.
    Google ScholarLocate open access versionFindings
  • [143] A. van den Oord, O. Vinyals, et al. Neural discrete representation learning. In NIPS, pages 6306–6315, 2017.
    Google ScholarLocate open access versionFindings
  • [144] A. Van Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. In ICML, pages 1747–1756, 2016.
    Google ScholarLocate open access versionFindings
  • [145] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, pages 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • [146] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
    Findings
  • [147] P. Velickovic, W. Fedus, W. L. Hamilton, P. Lio, Y. Bengio, and R. D. Hjelm. Deep graph infomax. arXiv preprint arXiv:1809.10341, 2018.
    Findings
  • [148] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • [149] P. Wang, S. Li, and R. Pan. Incorporating gan for negative sampling in knowledge representation learning. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • [150] T. Wang and P. Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. arXiv preprint arXiv:2005.10242, 2020.
    Findings
  • [151] Z. Wang, Q. She, and T. E. Ward. Generative adversarial networks: A survey and taxonomy. arXiv preprint arXiv:1906.01529, 2019.
    Findings
  • [152] C. Wei, L. Xie, X. Ren, Y. Xia, C. Su, J. Liu, Q. Tian, and A. L. Yuille. Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning. In CVPR, pages 1910–1919, 2019.
    Google ScholarLocate open access versionFindings
  • [153] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pages 3733–3742, 2018.
    Google ScholarLocate open access versionFindings
  • [154] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le. Self-training with noisy student improves imagenet classification. In CVPR, pages 10687–10698, 2020.
    Google ScholarLocate open access versionFindings
  • [155] S. Xie, R. B. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. 2017 IEEE CVPR, pages 5987–5995, 2017.
    Google ScholarLocate open access versionFindings
  • [156] W. Xiong, J. Du, W. Y. Wang, and V. Stoyanov. Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. arXiv preprint arXiv:1912.09637, 2019.
    Findings
  • [157] X. Yan, I. Misra, A. Gupta, D. Ghadiyaram, and D. Mahajan. Clusterfit: Improving generalization of visual representations. arXiv preprint arXiv:1912.03330, 2019.
    Findings
  • [158] J. Yang, D. Parikh, and D. Batra. Joint unsupervised learning of deep representations and image clusters. In CVPR, pages 5147–5156, 2016.
    Google ScholarLocate open access versionFindings
  • [159] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. In NIPS, pages 5754–5764, 2019.
    Google ScholarLocate open access versionFindings
  • [160] Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
    Findings
  • [161] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In NIPS, pages 6410–6421, 2018.
    Google ScholarLocate open access versionFindings
  • [162] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. In ICML, pages 5708–5717, 2018.
    Google ScholarLocate open access versionFindings
  • [163] Y. You, T. Chen, Z. Wang, and Y. Shen. When does selfsupervision help graph convolutional networks? arXiv preprint arXiv:2006.09136, 2020.
    Findings
  • [164] S. Zagoruyko and N. Komodakis. Wide residual networks. ArXiv, abs/1605.07146, 2016.
    Findings
  • [165] F. Zhang, X. Liu, J. Tang, Y. Dong, P. Yao, J. Zhang, X. Gu, Y. Wang, B. Shao, R. Li, and K. Wang. Oag: Toward linking large-scale heterogeneous entity graphs. In KDD’19, pages 2585–2595, 2019.
    Google ScholarLocate open access versionFindings
  • [166] J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: fast and scalable network representation learning. In IJCAI, pages 4278–4284, 2019.
    Google ScholarLocate open access versionFindings
  • [167] M. Zhang, Z. Cui, M. Neumann, and Y. Chen. An end-to-end deep learning architecture for graph classification. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • [168] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In ECCV, pages 649–666.
    Google ScholarLocate open access versionFindings
  • [169] R. Zhang, P. Isola, and A. A. Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, pages 1058–1067, 2017.
    Google ScholarLocate open access versionFindings
  • [170] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
    Findings
  • [171] D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In SIGKDD, pages 2827–2836, 2018.
    Google ScholarLocate open access versionFindings
  • [172] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward multimodal image-to-image translation. In NIPS, pages 465–476, 2017.
    Google ScholarLocate open access versionFindings
  • [173] C. Zhuang, A. L. Zhai, and D. Yamins. Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE ICCV, pages 6002–6012, 2019.
    Google ScholarLocate open access versionFindings
  • [174] B. Zoph, G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. V. Le. Rethinking pre-training and self-training. arXiv preprint arXiv:2006.06882, 2020.
    Findings
  • [175] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. v2/v3 (June 2020): correct several typos and mistakes.
    Findings
  • v4 (July 2020): add papers newly published; add a new theoretical analysis part for contrastive objective; add semi-supervised self-training’s connection with selfsupervised contrastive learning.
    Google ScholarFindings
  • Li Mian received bachelor degree(2020) from Department of Computer Science, Beijing Institute of Technology. She is now admitted into a graduate program in Georgia Institute of Technology. Her research interests focus on data mining, natural language processing and machine learning.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments