S4L: Self-Supervised Semi-Supervised Learning

Avital Oliver
Avital Oliver
Alexander Kolesnikov
Alexander Kolesnikov

International Conference on Computer Vision, pp. 1476-1485, 2019.

Cited by: 67|Bibtex|Views88|Links
EI
Keywords:
Virtual Adversarial TrainingMix Of All Modelssemi supervised methodlearning methodimage recognitionMore(10+)
Weibo:
We further showed that S4L methods are complementary to existing semisupervision techniques, and Mix Of All Models, our proposed combination of those, leads to state-of-the-art performance

Abstract:

This work tackles the problem of semi-supervised learning of image classifiers. Our main insight is that the field of semi-supervised learning can benefit from the quickly advancing field of self-supervised visual representation learning. Unifying these two approaches, we propose the framework of self-supervised semi-supervised learning (...More

Code:

Data:

0
Introduction
  • Modern computer vision systems demonstrate outstanding performance on a variety of challenging computer vision benchmarks, such as image recognition [32], object detection [20], semantic image segmentation [8], etc.
  • Their success relies on the availability of a large amount of annotated data that is time-consuming and expensive to acquire.
  • The fact that humans quickly understand new concepts after seeing only a few examples suggests that this goal is achievable in principle
Highlights
  • Modern computer vision systems demonstrate outstanding performance on a variety of challenging computer vision benchmarks, such as image recognition [32], object detection [20], semantic image segmentation [8], etc
  • Many real-world computer vision applications are concerned with visual categories that are not present in standard benchmark datasets, or with applications of dynamic nature where visual categories or their appearance may change over time
  • We further investigate the gap between the representation learned by an S4L model (MOAM) and a corresponding baseline trained on 100 % of the labels
  • We have bridged the gap between selfsupervised and semi-supervised learning by suggesting an S4L framework which can be used to turn any selfsupervised method into a semi-supervised learning model
  • We further showed that S4L methods are complementary to existing semisupervision techniques, and Mix Of All Models (MOAM), our proposed combination of those, leads to state-of-the-art performance
Methods
  • The authors present the self-supervised semisupervised learning (S4L) techniques.
  • The authors first provide a general description of the approach.
  • The authors focus on the semi-supervised image classification problem.
  • The authors assume an data generating joint distribution p(X, Y ) over images and labels.
  • The learning algorithm has access to a labeled training set Dl, which is sampled i.i.d. from p(X, Y ) and an unlabeled training set Du, which is sampled i.i.d. from the marginal distribution p(X)
Results
  • The authors further demonstrate that by combining the best S4L methods with existing semi-supervised techniques, the authors achieve new state-of-the-art performance on the semi-supervised ILSVRC-2012 benchmark.
  • A more detailed presentation of the results is provided in the supplementary material
  • Using this only slightly altered training procedure, the baseline models achieve 80.43 % top5 accuracy (56.35 % top1) on the public ILSVRC-2012 validation set when trained on only 10 % of the full training set.
  • Supervision loss Lsup on the augmented images generated by self-supervision does improve performance by almost 1 %
  • This allows to use multiple transformed copies of an image at inference-time and take the average of their predictions.
  • The model obtained after this first step achieves 88.80% top-5 accuracy on ILSVRC-2012
Conclusion
  • Discussion and Future

    Work

    In this paper, the authors have bridged the gap between selfsupervised and semi-supervised learning by suggesting an S4L framework which can be used to turn any selfsupervised method into a semi-supervised learning model.

    The authors instantiated two such methods: S4L-Rotation and S4L-Exemplar and have shown that they perform competitively to methods from the semi-supervised literature on the challenging ILSVRC-2012 dataset.
  • The authors have bridged the gap between selfsupervised and semi-supervised learning by suggesting an S4L framework which can be used to turn any selfsupervised method into a semi-supervised learning model.
  • The authors instantiated two such methods: S4L-Rotation and S4L-Exemplar and have shown that they perform competitively to methods from the semi-supervised literature on the challenging ILSVRC-2012 dataset.
  • The authors hope that this work inspires other researchers in the field of self-supervision to consider extending their methods into semi-supervised methods using our S4L framework, as well as researchers in the field of semisupervised learning to take inspiration from the vast amount of recently proposed self-supervision methods
Summary
  • Introduction:

    Modern computer vision systems demonstrate outstanding performance on a variety of challenging computer vision benchmarks, such as image recognition [32], object detection [20], semantic image segmentation [8], etc.
  • Their success relies on the availability of a large amount of annotated data that is time-consuming and expensive to acquire.
  • The fact that humans quickly understand new concepts after seeing only a few examples suggests that this goal is achievable in principle
  • Objectives:

    The authors aim to provide a strong baseline for future research by performing a relatively large search over training hyperparameters for training a model on only 10 % of ILSVRC-2012.
  • Methods:

    The authors present the self-supervised semisupervised learning (S4L) techniques.
  • The authors first provide a general description of the approach.
  • The authors focus on the semi-supervised image classification problem.
  • The authors assume an data generating joint distribution p(X, Y ) over images and labels.
  • The learning algorithm has access to a labeled training set Dl, which is sampled i.i.d. from p(X, Y ) and an unlabeled training set Du, which is sampled i.i.d. from the marginal distribution p(X)
  • Results:

    The authors further demonstrate that by combining the best S4L methods with existing semi-supervised techniques, the authors achieve new state-of-the-art performance on the semi-supervised ILSVRC-2012 benchmark.
  • A more detailed presentation of the results is provided in the supplementary material
  • Using this only slightly altered training procedure, the baseline models achieve 80.43 % top5 accuracy (56.35 % top1) on the public ILSVRC-2012 validation set when trained on only 10 % of the full training set.
  • Supervision loss Lsup on the augmented images generated by self-supervision does improve performance by almost 1 %
  • This allows to use multiple transformed copies of an image at inference-time and take the average of their predictions.
  • The model obtained after this first step achieves 88.80% top-5 accuracy on ILSVRC-2012
  • Conclusion:

    Discussion and Future

    Work

    In this paper, the authors have bridged the gap between selfsupervised and semi-supervised learning by suggesting an S4L framework which can be used to turn any selfsupervised method into a semi-supervised learning model.

    The authors instantiated two such methods: S4L-Rotation and S4L-Exemplar and have shown that they perform competitively to methods from the semi-supervised literature on the challenging ILSVRC-2012 dataset.
  • The authors have bridged the gap between selfsupervised and semi-supervised learning by suggesting an S4L framework which can be used to turn any selfsupervised method into a semi-supervised learning model.
  • The authors instantiated two such methods: S4L-Rotation and S4L-Exemplar and have shown that they perform competitively to methods from the semi-supervised literature on the challenging ILSVRC-2012 dataset.
  • The authors hope that this work inspires other researchers in the field of self-supervision to consider extending their methods into semi-supervised methods using our S4L framework, as well as researchers in the field of semisupervised learning to take inspiration from the vast amount of recently proposed self-supervision methods
Tables
  • Table1: Top-5 accuracy [%] obtained by individual methods when training them on ILSVRC-2012 with a subset of labels. All methods use the same standard width ResNet50v2 architecture
  • Table2: Comparing our MOAM to previous methods in the literature on ILSVRC-2012 with 10 % of the labels. Note that different models use different architectures, larger than those in Table 1
Download tables as Excel
Related work
  • In this work we build on top of the current state-of-theart in both fields of semi-supervised and self-supervised learning. Therefore, in this section we review the most relevant developments in these fields.

    2.1. Semi-supervised Learning

    Semi-supervised learning describes a class of algorithms that seek to learn from both unlabeled and labeled samples, typically assumed to be sampled from the same or similar distributions. Approaches differ on what information to gain from the structure of the unlabeled data.

    Given the wide variety of semi-supervised learning techniques proposed in the literature, we refer to [4] for an extensive survey. For more context, we focus on recent developments based on deep neural networks.
Reference
  • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon Wilson. There are many consistent explanations of unlabeled data: Why you should average. ICLR, 2019. 2
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249, 2019. 2
    Findings
  • Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. European Conference on Computer Vision (ECCV), 2018. 3
    Google ScholarLocate open access versionFindings
  • Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. 2
    Google ScholarFindings
  • Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In International Conference on Computer Vision (ICCV), 2013
    Google ScholarLocate open access versionFindings
  • Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2014. 3
    Google ScholarLocate open access versionFindings
  • Frederik Ebert, Sudeep Dasari, Alex X Lee, Sergey Levine, and Chelsea Finn. Robustness via retrying: Closed-loop robotic manipulation with self-supervised learning. Conference on Robot Learning (CoRL), 2018. 2
    Google ScholarFindings
  • Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 111(1), 2015. 1
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR), 2018. 1, 3
    Google ScholarLocate open access versionFindings
  • Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 529–536. MIT Press, 2005. 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6
    Google ScholarLocate open access versionFindings
  • Olivier J Henaff, Ali Razavi, Carl Doersch, SM Eslami, and Aaron van den Oord. Data-efficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272, 2019. 2, 6
    Findings
  • Alexander Hermans, Lucas Beyer, and Bastian Leibe. In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint arXiv:1703.07737, 2017. 3
    Findings
  • Eric Jang, Coline Devin, Vincent Vanhoucke, and Sergey Levine. Grasp2Vec: Learning object representations from self-supervised grasping. In Conference on Robot Learning, 2018. 2
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma, Danilo Jimenez Rezende, Shakir Mohamed, and Max Welling. Semi-supervised learning with deep generative models. CoRR, abs/1406.5298, 2014. 2
    Findings
  • Alexander Kolesnikov, Xiaohua Zhai, and Lucas Beyer. Revisiting self-supervised visual representation learning. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. 2
    Google ScholarFindings
  • Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. CoRR, abs/1610.02242, 2016. 2
    Findings
  • Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. ICML 2013 Workshop: Challenges in Representation Learning (WREPL), 07 2013. 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV. Springer, 2014. 1
    Google ScholarLocate open access versionFindings
  • Bin Liu, Zhirong Wu, Han Hu, and Stephen Lin. Deep metric transfer for label propagation with limited annotated data. arXiv preprint arXiv:1812.08781, 2018. 2
    Findings
  • Takeru Miyato, Shin-ichih Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017. 2, 4, 5
    Findings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. 2
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (ECCV), 2016. 3
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. Representation learning by learning to count. In International Conference on Computer Vision (ICCV), 2017. 3
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3
    Google ScholarLocate open access versionFindings
  • Augustus Odena. Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583, 2016. 2
    Findings
  • Avital Oliver, Augustus Odena, Colin A Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems, pages 3239–3250, 2018. 2, 4, 5, 8
    Google ScholarLocate open access versionFindings
  • Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. European Conference on Computer Vision (ECCV), 2018. 2
    Google ScholarLocate open access versionFindings
  • Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. Variational autoencoder for deep learning of images, labels and captions. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2352–2360. 2016. 4, 6
    Google ScholarLocate open access versionFindings
  • Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3546–3554. Curran Associates, Inc., 2015. 2, 4, 8
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. CoRR, abs/1606.03498, 2016. 2
    Findings
  • Nawid Sayed, Biagio Brattoli, and Bjorn Ommer. Cross and learn: Cross-modal self-supervision. arXiv preprint arXiv:1811.03879, 2018. 2
    Findings
  • Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. Timecontrastive networks: Self-supervised learning from video. arXiv preprint arXiv:1704.06888, 2017. 2
    Findings
  • Rui Shu, Hung Bui, Hirokazu Narui, and Stefano Ermon. A DIRT-t approach to unsupervised domain adaptation. In International Conference on Learning Representations, 2018. 4
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3
    Google ScholarLocate open access versionFindings
  • Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems, pages 1195–1204, 2017. 2, 4, 6
    Google ScholarLocate open access versionFindings
  • Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019. 2
    Findings
  • Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848, 2019. 2, 6
    Findings
  • Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In European Conference on Computer Vision (ECCV), 2016. 3
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487–495, 2014. 7
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments