Self-Training With Noisy Student Improves ImageNet Classification

CVPR, pp. 10684-10695, 2020.

Cited by: 71|Bibtex|Views196|Links
EI
Keywords:
student modelB2 Noisy Student Trainingimage recognitionmean corruption errordeep learningMore(20+)
Weibo:
Our experiments showed that Noisy Student Training and EfficientNet can achieve an accuracy of 88.4% which is 2.9% higher than without Noisy Student Training

Abstract:

We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, an...More

Code:

Data:

0
Introduction
  • Deep learning has shown remarkable successes in image recognition in recent years [45, 80, 75, 30, 83].
  • Sized impact on robustness
  • For this purpose, the authors use a much larger corpus of unlabeled images, where a large fraction of images do not belong to ImageNet training set distribution.
  • The authors train the model with Noisy Student Training, a semi-supervised learning approach, which has three main steps: (1) train a teacher model on labeled images, (2) use the teacher to generate pseudo labels on unlabeled images, and (3) train a student model on the combination of labeled images and pseudo labeled images.
  • The authors iterate this algorithm a few times by treating the student as a teacher to relabel the unlabeled data and training a new student
Highlights
  • Deep learning has shown remarkable successes in image recognition in recent years [45, 80, 75, 30, 83]
  • To intuitively understand the significant improvements on the three robustness benchmarks, we show several images in Figure 3 where the predictions of the standard model are incorrect while the predictions of the model with Noisy Student Training are correct
  • We evaluate our EfficientNet-L2 Noisy Student Training models with and without Noisy Student Training against an FGSM attack
  • We filter low confidence images according to B0 Noisy Student Training’s prediction and only keep the top 130K images for each class according to the top-1 predicted class
  • We showed that it is possible to use unlabeled images to significantly advance both accuracy and robustness of state-of-the-art ImageNet models
  • Our experiments showed that Noisy Student Training and EfficientNet can achieve an accuracy of 88.4% which is 2.9% higher than without Noisy Student Training
Methods
  • ResNet-50 Billion-scale [93] ResNeXt-101 Billion-scale [93] ResNeXt-101 WSL [55] FixRes ResNeXt-101 WSL [86].
  • Noisy Student Training (EfficientNet-L2) # Params Extra Data.
  • 3.5B images labeled with tags.
  • 300M weakly labeled images from JFT.
  • 300M unlabeled images from JFT Top-1 Acc. Top-5 Acc. 50 Noisy Student Training (ResNet-50)
  • The top-1 accuracy reported in this paper is the average accuracy for all images included in ImageNetP
Results
  • Robustness Results on ImageNet

    A, ImageNetC and ImageNet-P

    The authors evaluate the best model, that achieves 88.4% top1 accuracy, on three robustness test sets: ImageNetA, ImageNet-C and ImageNet-P.
  • ImageNet-A test set [32] consists of difficult images that cause significant drops in accuracy to state-of-the-art models.
  • Similar to the case for JFT, the authors first filter images from ImageNet validation set.
  • The authors use the full ImageNet as the labeled data and the 130M images from JFT as the unlabeled data.
  • As shown in Table 18, Noisy Student Training improves the baseline accuracy from 98.1% to 98.6% and outperforms the previous state-of-the-art results achieved by RandAugment with Wide-ResNet-28-10
Conclusion
  • Prior works on weakly-supervised learning required billions of weakly labeled data to improve state-of-the-art ImageNet models.
  • The authors' experiments showed that Noisy Student Training and EfficientNet can achieve an accuracy of 88.4% which is 2.9% higher than without Noisy Student Training
  • This result is a new state-of-the-art and 2.0% better than the previous best method that used an order of magnitude more weakly labeled data [55, 86]
Summary
  • Introduction:

    Deep learning has shown remarkable successes in image recognition in recent years [45, 80, 75, 30, 83].
  • Sized impact on robustness
  • For this purpose, the authors use a much larger corpus of unlabeled images, where a large fraction of images do not belong to ImageNet training set distribution.
  • The authors train the model with Noisy Student Training, a semi-supervised learning approach, which has three main steps: (1) train a teacher model on labeled images, (2) use the teacher to generate pseudo labels on unlabeled images, and (3) train a student model on the combination of labeled images and pseudo labeled images.
  • The authors iterate this algorithm a few times by treating the student as a teacher to relabel the unlabeled data and training a new student
  • Methods:

    ResNet-50 Billion-scale [93] ResNeXt-101 Billion-scale [93] ResNeXt-101 WSL [55] FixRes ResNeXt-101 WSL [86].
  • Noisy Student Training (EfficientNet-L2) # Params Extra Data.
  • 3.5B images labeled with tags.
  • 300M weakly labeled images from JFT.
  • 300M unlabeled images from JFT Top-1 Acc. Top-5 Acc. 50 Noisy Student Training (ResNet-50)
  • The top-1 accuracy reported in this paper is the average accuracy for all images included in ImageNetP
  • Results:

    Robustness Results on ImageNet

    A, ImageNetC and ImageNet-P

    The authors evaluate the best model, that achieves 88.4% top1 accuracy, on three robustness test sets: ImageNetA, ImageNet-C and ImageNet-P.
  • ImageNet-A test set [32] consists of difficult images that cause significant drops in accuracy to state-of-the-art models.
  • Similar to the case for JFT, the authors first filter images from ImageNet validation set.
  • The authors use the full ImageNet as the labeled data and the 130M images from JFT as the unlabeled data.
  • As shown in Table 18, Noisy Student Training improves the baseline accuracy from 98.1% to 98.6% and outperforms the previous state-of-the-art results achieved by RandAugment with Wide-ResNet-28-10
  • Conclusion:

    Prior works on weakly-supervised learning required billions of weakly labeled data to improve state-of-the-art ImageNet models.
  • The authors' experiments showed that Noisy Student Training and EfficientNet can achieve an accuracy of 88.4% which is 2.9% higher than without Noisy Student Training
  • This result is a new state-of-the-art and 2.0% better than the previous best method that used an order of magnitude more weakly labeled data [55, 86]
Tables
  • Table1: Summary of key results compared to previous state-of-the-art models [<a class="ref-link" id="c86" href="#r86">86</a>, <a class="ref-link" id="c55" href="#r55">55</a>]. Lower is better for mean corruption error (mCE) and mean flip rate (mFR)
  • Table2: Top-1 and Top-5 Accuracy of Noisy Student Training and previous state-of-the-art methods on ImageNet. EfficientNet-L2 with Noisy Student Training is the result of iterative training for multiple iterations by putting back the student model as the new teacher. It has better tradeoff in terms of accuracy and model size compared to previous state-ofthe-art models. †: Big Transfer is a concurrent work that performs transfer learning from the JFT dataset
  • Table3: Robustness results on ImageNet-A
  • Table4: Robustness results on ImageNet-C. mCE is the weighted average of error rate on different corruptions, with AlexNet’s error rate as a baseline (lower is better)
  • Table5: Robustness results on ImageNet-P, where images are generated with a sequence of perturbations. mFR measures the model’s probability of flipping predictions under perturbations with AlexNet as a baseline (lower is better)
  • Table6: Ablation study of noising. We use EfficientNetB5 as the teacher model and study two cases with different numbers of unlabeled images and different augmentations. For the experiment with 1.3M unlabeled images, we use the standard augmentation including random translation and flipping for both the teacher and the student. For the experiment with 130M unlabeled images, we use RandAugment. Aug and SD denote data augmentation and stochastic depth respectively. We remove the noise for unlabeled images while keeping them for labeled images. Here, iterative training is not used and unlabeled batch size is set to be the same as the labeled batch size to save training time
  • Table7: Iterative training improves the accuracy, where batch size ratio denotes the ratio between unlabeled data and labeled data
  • Table8: Results using YFCC100M and JFT as the unlabeled dataset
  • Table9: Architecture specifications for EfficientNets used in the paper. The width w and depth d are the scaling factors that need to be contextualized in EfficientNet [<a class="ref-link" id="c83" href="#r83">83</a>]. Train Res. and Test Res. denote training and testing resolutions respectively
  • Table10: Using our best model with 88.4% accuracy as the teacher (denoted as Noisy Student Training (X, L2)) leads to more improvements than using the same model as the teacher (denoted as Noisy Student Training (X)). Models smaller than EfficientNet-B5 are trained for 700 epochs (better than training for 350 epochs as used in Study #4 to Study #8). Models other than EfficientNet-B0 uses an unlabeled batch size of three times the labeled batch size, while other ablation studies set the unlabeled batch size to be the same as labeled batch size by default for models smaller than B7
  • Table11: Noisy Student Training’s performance improves with more unlabeled data. Models are trained for 700 epochs without iterative training. The baseline model achieves an accuracy of 83.2%
  • Table12: Using a larger student model leads to better performance. Student models are trained for 350 epochs instead of 700 epochs without iterative training. The B7 teacher with an accuracy of 86.9% is trained by Noisy Student Training with multiple iterations using B7. The comparison between B7 and L2 as student models is not completely fair for L2, since we use an unlabeled batch size of 3x the labeled batch size for training L2, which is not as good as using an unlabeled batch size of 7x the labeled batch size when training B7 (See Study #7 for more details)
  • Table13: Data balancing leads to better results for small models. Models are trained for 350 epochs instead of 700 epochs without iterative training
  • Table14: Joint training works better than pretraining and finetuning. We vary the finetuning steps and report the best results. Models are trained for 350 epochs instead of 700 epochs without iterative training
  • Table15: With a fixed labeled batch size, a larger unlabeled batch size leads to better performance for EfficientNet-L2. The Batch Size Ratio denotes the ratio between unlabeled batch size and labeled batch size
  • Table16: A student initialized with the teacher still requires at least 140 epochs to perform well. The baseline model, trained with labeled data only, has an accuracy of 77.3%
  • Table17: Experiments on ResNet-50
  • Table18: Results on SVHN
Download tables as Excel
Related work
  • Self-training. Our work is based on self-training (e.g., [71, 96, 68, 67]). Self-training first uses labeled data to train a good teacher model, then use the teacher model to label unlabeled data and finally use the labeled data and unlabeled data to jointly train a student model. In typical self-training with the teacher-student framework, noise injection to the student is not used by default, or the role of noise is not fully understood or justified. The main difference between our work and prior works is that we identify the importance of noise, and aggressively inject noise to make the student better.

    Self-training was previously used to improve ResNet-50 from 76.4% to 81.2% top-1 accuracy [93] which is still far from the state-of-the-art accuracy. Yalniz et al [93] also did not show significant improvements in terms of robustness on ImageNet-A, C and P as we did. In terms of methodology, they proposed to first only train on unlabeled images and then finetune their model on labeled images as the final stage. In Noisy Student Training, we combine these two steps into one because it simplifies the algorithm and leads to better performance in our experiments.
Funding
  • Presents Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant
  • Noisy Student Training achieves 88.4% top1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images
  • Presents an ablation study on the effects of noise in Section 4.1
  • Finds that using a batch size of 512, 1024, and 2048 leads to the same performance
  • Improves EfficientNet’s ImageNet top-1 accuracy to 88.4%. This accuracy is 2.0% better than the previous SOTA results which requires 3.5B weakly labeled Instagram images
Reference
  • Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin McGuinness. Pseudo-labeling and confirmation bias in deep semi-supervised learning. arXiv preprint arXiv:1908.02983, 2019. 3, 9
    Findings
  • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon Wilson. There are many consistent explanations of unlabeled data: Why you should average. In International Conference on Learning Representations, 2018. 9
    Google ScholarLocate open access versionFindings
  • Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, pages 2654–2662, 2014. 9
    Google ScholarLocate open access versionFindings
  • Yauhen Babakhin, Artsiom Sanakoyeu, and Hirotoshi Kitamura. Semi-supervised segmentation of salt bodies in seismic images using an ensemble of convolutional neural networks. arXiv preprint arXiv:1904.04445, 2019. 9
    Findings
  • Philip Bachman, Ouais Alsharif, and Doina Precup. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems, pages 3365–3373, 2014. 3, 9
    Google ScholarLocate open access versionFindings
  • Anoop Korattikara Balan, Vivek Rathod, Kevin P Murphy, and Max Welling. Bayesian dark knowledge. In Advances in Neural Information Processing Systems, pages 3438– 3446, 2015. 9
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785, 2019. 9
    Findings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems, 2019. 3, 9
    Google ScholarLocate open access versionFindings
  • Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100. ACM, 1998. 9
    Google ScholarLocate open access versionFindings
  • Cristian Bucilu, Rich Caruana, and Alexandru NiculescuMizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541. ACM, 2006. 9
    Google ScholarLocate open access versionFindings
  • Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019. 9
    Findings
  • Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542, 2009. 3, 9
    Google ScholarLocate open access versionFindings
  • Yanbei Chen, Xiatian Zhu, and Shaogang Gong. Semisupervised deep learning with memory. In Proceedings of the European Conference on Computer Vision (ECCV), pages 268–283, 2018. 9
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. Semi-supervised learning for neural machine translation. arXiv preprint arXiv:1606.04596, 2016. 9
    Findings
  • Francois Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017. 3, 4
    Google ScholarLocate open access versionFindings
  • Kevin Clark, Minh-Thang Luong, Christopher D Manning, and Quoc V Le. Semi-supervised sequence modeling with cross-view training. In Empirical Methods in Natural Language Processing (EMNLP), 2018. 9
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical data augmentation with no separate search. arXiv preprint arXiv:1909.13719, 2019. 1, 2, 3
    Findings
  • Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Ruslan R Salakhutdinov. Good semi-supervised learning that requires a bad gan. In Advances in Neural Information Processing Systems, pages 6510–6520, 2017. 9
    Google ScholarLocate open access versionFindings
  • Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. Understanding back-translation at scale. In Proceedings of the 2018 conference on Empirical methods in natural language processing, pages 489–500, 2018. 9
    Google ScholarLocate open access versionFindings
  • Tommaso Furlanello, Zachary C Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. Born again neural networks. In International Conference on Machine Learning, 2018. 9
    Google ScholarLocate open access versionFindings
  • Angus Galloway, Anna Golubeva, Thomas Tanay, Medhat Moussa, and Graham W Taylor. Batch normalization is a cause of adversarial vulnerability. arXiv preprint arXiv:1905.02161, 2019. 7
    Findings
  • Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2019. 5
    Google ScholarLocate open access versionFindings
  • Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018. 7
    Findings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. 7
    Google ScholarLocate open access versionFindings
  • Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005. 9
    Google ScholarLocate open access versionFindings
  • Keren Gu, Brandon Yang, Jiquan Ngiam, Quoc Le, and Jonathan Shlens. Using videos to evaluate image model robustness. In ICLR Workshop, 2019. 9
    Google ScholarLocate open access versionFindings
  • Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. Dual learning for machine translation. In Advances in Neural Information Processing Systems, pages 820–828, 2016. 9
    Google ScholarLocate open access versionFindings
  • Junxian He, Jiatao Gu, Jiajun Shen, and Marc’Aurelio Ranzato. Revisiting self-training for neural sequence generation. arXiv preprint arXiv:1909.13788, 2019. 9
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1, 4, 17
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Thomas G Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019. 1, 5, 9, 17
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. arXiv preprint arXiv:1907.07174, 2019. 1, 5
    Findings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 2, 3, 9
    Findings
  • Andrew G Howard. Some improvements on deep convolutional neural network based image classification. arXiv preprint arXiv:1312.5402, 2013. 18
    Findings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018. 4
    Google ScholarLocate open access versionFindings
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. 4
    Google ScholarLocate open access versionFindings
  • Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In European conference on computer vision, pages 646–661. Springer, 2016. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • Yanping Huang, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, and Zhifeng Chen. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems, 2019. 4
    Google ScholarLocate open access versionFindings
  • Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5070–5079, 2019. 3, 9
    Google ScholarLocate open access versionFindings
  • Giannis Karamanolakis, Daniel Hsu, and Luis Gravano. Leveraging just a few keywords for fine-grained aspect detection through weakly supervised co-training. Empirical Methods in Natural Language Processing (EMNLP), 2019. 9
    Google ScholarLocate open access versionFindings
  • Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised learning with deep generative models. In Advances in neural information processing systems, pages 3581–3589, 2014. 9
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. 9
    Findings
  • Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. Large scale learning of general visual representations for transfer. arXiv preprint arXiv:1912.11370, 2019. 4
    Findings
  • Simon Kornblith, Jonathon Shlens, and Quoc V Le. Do better imagenet models transfer better? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2661–2671, 2019. 3
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012. 1, 4
    Google ScholarLocate open access versionFindings
  • Guokun Lai, Barlas Oguz, and Veselin Stoyanov. Bridging the domain gap in cross-lingual document classification. arXiv preprint arXiv:1909.07009, 2019. 9
    Findings
  • Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. In International Conference on Learning Representations, 2017. 3, 9
    Google ScholarLocate open access versionFindings
  • Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013. 3, 9
    Google ScholarLocate open access versionFindings
  • Yingting Li, Lu Liu, and Robby T Tan. Certaintydriven consistency loss for semi-supervised learning. arXiv preprint arXiv:1901.05657, 2019. 9
    Findings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018. 4
    Google ScholarLocate open access versionFindings
  • Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D Cubuk. Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611, 2019. 5
    Findings
  • Yucen Luo, Jun Zhu, Mengxi Li, Yong Ren, and Bo Zhang. Smooth neighbors on teacher graphs for semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8896–8905, 2018. 9
    Google ScholarLocate open access versionFindings
  • Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016. 9
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018. 7
    Google ScholarLocate open access versionFindings
  • Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens van der Maaten. Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), pages 181–196, 2018. 1, 4, 5, 10, 18
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori Koyama. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018. 3, 9
    Google ScholarLocate open access versionFindings
  • Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data. In Advances in Neural Information Processing Systems, 2019. 9
    Google ScholarLocate open access versionFindings
  • Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V Le, and Ruoming Pang. Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056, 2018. 3
    Findings
  • A Emin Orhan. Robustness properties of facebook’s resnext wsl models. arXiv preprint arXiv:1907.07640, 2019. 5
    Findings
  • Sungrae Park, JunKeon Park, Su-Jin Shin, and Il-Chul Moon. Adversarial dropout for supervised and semisupervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 9
    Google ScholarLocate open access versionFindings
  • Sree Hari Krishnan Parthasarathi and Nikko Strom. Lessons from building acoustic models with a million hours of speech. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6670–6674. IEEE, 2019. 9
    Google ScholarLocate open access versionFindings
  • Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. Deep co-training for semi-supervised image recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 135–152, 2018. 9
    Google ScholarLocate open access versionFindings
  • Ilija Radosavovic, Piotr Dollar, Ross Girshick, Georgia Gkioxari, and Kaiming He. Data distillation: Towards omni-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4119–4128, 2018. 9
    Google ScholarLocate open access versionFindings
  • Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in neural information processing systems, pages 3546–3554, 2015. 3, 9
    Google ScholarLocate open access versionFindings
  • Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4780–4789, 2019. 4
    Google ScholarLocate open access versionFindings
  • Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? International Conference on Machine Learning, 2019. 3, 4, 9
    Google ScholarLocate open access versionFindings
  • Ellen Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the national conference on artificial intelligence, pages 1044–1049, 1996. 9
    Google ScholarLocate open access versionFindings
  • Ellen Riloff and Janyce Wiebe. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 105–112, 2003. 9
    Google ScholarLocate open access versionFindings
  • Aruni Roy Chowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, and Erik G. Learned-Miller. Automatic adaptation of object detectors to new domains using self-training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 9
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016. 9
    Google ScholarLocate open access versionFindings
  • H Scudder. Probability of error of some adaptive patternrecognition machines. IEEE Transactions on Information Theory, 11(3):363–371, 1965. 2, 9
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709, 2015. 9
    Findings
  • Weiwei Shi, Yihong Gong, Chris Ding, Zhiheng MaXiaoyu Tao, and Nanning Zheng. Transductive semisupervised deep learning using min-max features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 299–315, 2018. 3, 9
    Google ScholarLocate open access versionFindings
  • Carl-Johann Simon-Gabriel, Yann Ollivier, Leon Bottou, Bernhard Scholkopf, and David Lopez-Paz. First-order adversarial vulnerability of neural networks and input dimension. In International Conference on Machine Learning, pages 5809–5817, 2019. 7
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015. 1
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019. 9
    Findings
  • Qianru Sun, Xinzhe Li, Yaoyao Liu, Shibao Zheng, TatSeng Chua, and Bernt Schiele. Learning to self-train for semi-supervised few-shot classification. arXiv preprint arXiv:1906.00562, 2019. 9
    Findings
  • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017. 4
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. 1, 4, 18
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016. 4
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. 9
    Findings
  • Mingxing Tan and Quoc V Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 2019. 1, 3, 4, 5, 14
    Google ScholarLocate open access versionFindings
  • Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems, pages 1195–1204, 2017. 3, 9
    Google ScholarLocate open access versionFindings
  • Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016. 3, 8
    Google ScholarLocate open access versionFindings
  • Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Herve Jegou. Fixing the train-test resolution discrepancy. arXiv preprint arXiv:1906.06423, 2019. 1, 3, 4, 10
    Findings
  • Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, and Serge Belongie. Learning from noisy large-scale datasets with minimal supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 839–847, 2017. 9
    Google ScholarLocate open access versionFindings
  • Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), 2019. 9
    Google ScholarLocate open access versionFindings
  • Jason Weston, Frederic Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012. 9
    Google ScholarLocate open access versionFindings
  • Lijun Wu, Yiren Wang, Yingce Xia, QIN Tao, Jianhuang Lai, and Tie-Yan Liu. Exploiting monolingual data at scale for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4198–4207, 2019. 9
    Google ScholarLocate open access versionFindings
  • Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848, 2019. 2, 3, 9
    Findings
  • Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017. 4
    Google ScholarLocate open access versionFindings
  • I. Zeki Yalniz, Herv’e J’egou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. Billion-scale semi-supervised learning for image classification. Arxiv 1905.00546, 2019. 2, 4, 9, 16
    Findings
  • Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861, 2016. 9
    Findings
  • Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, and William W Cohen. Semi-supervised qa with generative domain-adaptive nets. arXiv preprint arXiv:1702.02206, 2017. 9
    Findings
  • David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pages 189–196, 1995. 2, 9
    Google ScholarFindings
  • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, and Liwei Wang. Adversarially robust generalization just requires more unlabeled data. arXiv preprint arXiv:1906.00555, 2019. 9
    Findings
  • Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4L: Self-supervised semi-supervised learning. In Proceedings of the IEEE international conference on computer vision, 2019. 9
    Google ScholarLocate open access versionFindings
  • Richard Zhang. Making convolutional networks shiftinvariant again. In International Conference on Machine Learning, 2019. 5
    Google ScholarLocate open access versionFindings
  • Xingcheng Zhang, Zhizhong Li, Chen Change Loy, and Dahua Lin. Polynet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 718–726, 2017. 4
    Google ScholarLocate open access versionFindings
  • Giulio Zhou, Subramanya Dulloor, David G Andersen, and Michael Kaminsky. Edf: Ensemble, distill, and fuse for easy video labeling. arXiv preprint arXiv:1812.03626, 2018. 9
    Findings
  • Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912– 919, 2003. 9
    Google ScholarLocate open access versionFindings
  • Xiaojin Jerry Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2005. 3, 9
    Google ScholarFindings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 4 (1) Soft pseudo labels and hard pseudo labels can both lead to significant improvements with in-domain unlabeled images i.e., high-confidence images.
    Google ScholarLocate open access versionFindings
  • (2) With out-of-domain unlabeled images, hard pseudo labels can hurt the performance while soft pseudo labels lead to robust performance.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments