Unsupervised data augmentation for consistency training

NeurIPS, 2019.

Cited by: 204|Bibtex|Views308|Links
EI
Keywords:
Python Image Libraryoriginal exampleunlabeled exampleaugmentation transformationdatum augmentation methodMore(9+)
Weibo:
UNSUPERVISED DATA AUGMENTATION, employs state-of-the-art data augmentation found in supervised learning to generate diverse and realistic noise and enforces the model to be consistent with respect to these noise

Abstract:

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effe...More
0
Introduction
Highlights
  • A fundamental weakness of deep learning is that it typically requires a lot of labeled data to work well
  • We propose to use a rich set of state-of-the-art data augmentations verified in various supervised settings to inject noise and optimize the same consistency training objective on unlabeled examples
  • In order to test whether UNSUPERVISED DATA AUGMENTATION (UDA) can be combined with the success of unsupervised representation learning, such as BERT (Devlin et al, 2018), we further consider four initialization schemes: (a) random Transformer; (b) BERTBASE; (c) BERTLARGE; (d) BERTFINETUNE: BERTLARGE fine-tuned on in-domain unlabeled data3
  • We show that data augmentation and semi-supervised learning are well connected: better data augmentation can lead to significantly better semi-supervised learning
  • UDA, employs state-of-the-art data augmentation found in supervised learning to generate diverse and realistic noise and enforces the model to be consistent with respect to these noise
  • We theoretically analyze why UDA can improve the performance of a model and the required number of labeled examples to achieve a certain error rate
  • UDA outperforms prior works by a clear margin and nearly matches the performance of the fully supervised models trained on the full labeled sets which are one order of magnitude larger
Methods
  • 55.09 / 77.26 77.28 / 93.73 58.84 / 80.56 78.43 / 94.37. UDA (RandAugment) 68.78 / 88.80 79.05 / 94.49
Results
Conclusion
  • Before detailing the augmentation operations used in this work, the authors first provide some intuitions on how more advanced data augmentations can provide extra advantages over simple ones used in earlier works from three aspects:

    Valid noise: Advanced data augmentation methods that achieve great performance in supervised learning usually generate realistic augmented examples that share the same ground-truth labels with the original example.
  • Despite that state-of-the-art data augmentation methods can generate diverse and valid augmented examples as discussed in section 2.2, there is a trade-off between diversity and validity since diversity is achieved by changing a part of the original example, naturally leading to the risk of altering the ground-truth label.
  • UDA, employs state-of-the-art data augmentation found in supervised learning to generate diverse and realistic noise and enforces the model to be consistent with respect to these noise.
  • The authors hope that UDA will encourage future research to transfer advanced supervised augmentation to semi-supervised setting for different tasks
Summary
  • Introduction:

    A fundamental weakness of deep learning is that it typically requires a lot of labeled data to work well.
  • Consistency training methods regularize model predictions to be invariant to small noise applied to either input examples (Miyato et al, 2018; Sajjadi et al, 2016; Clark et al, 2018) or hidden states (Bachman et al, 2014; Laine & Aila, 2016).
  • How to design the augmentation transformation has, become critical
  • Methods:

    55.09 / 77.26 77.28 / 93.73 58.84 / 80.56 78.43 / 94.37. UDA (RandAugment) 68.78 / 88.80 79.05 / 94.49
  • Results:

    EVALUATION ON TEXT CLASSIFICATION DATASETS

    the authors further evaluate UDA in the language domain.
  • In order to test whether UDA can be combined with the success of unsupervised representation learning, such as BERT (Devlin et al, 2018), the authors further consider four initialization schemes: (a) random Transformer; (b) BERTBASE; (c) BERTLARGE; (d) BERTFINETUNE: BERTLARGE fine-tuned on in-domain unlabeled data3.
  • Under each of these four initialization schemes, the authors compare the performances with and without UDA.
  • Π-Model (Laine & Aila, 2016) Mean Teacher (Tarvainen & Valpola, 2017) VAT + EntMin (Miyato et al, 2018) SNTG (Luo et al, 2018) VAdD (Park et al, 2018) Fast-SWA (Athiwaratkun et al, 2018) ICT (Verma et al, 2019) Pseudo-Label (Lee, 2013) LGA + VAT (Jackson & Schulman, 2019) mixmixup (Hataya & Nakayama, 2019) ICT (Verma et al, 2019) MixMatch (Berthelot et al, 2019) Mean Teacher (Tarvainen & Valpola, 2017) Fast-SWA (Athiwaratkun et al, 2018) MixMatch (Berthelot et al, 2019)
  • Conclusion:

    Before detailing the augmentation operations used in this work, the authors first provide some intuitions on how more advanced data augmentations can provide extra advantages over simple ones used in earlier works from three aspects:

    Valid noise: Advanced data augmentation methods that achieve great performance in supervised learning usually generate realistic augmented examples that share the same ground-truth labels with the original example.
  • Despite that state-of-the-art data augmentation methods can generate diverse and valid augmented examples as discussed in section 2.2, there is a trade-off between diversity and validity since diversity is achieved by changing a part of the original example, naturally leading to the risk of altering the ground-truth label.
  • UDA, employs state-of-the-art data augmentation found in supervised learning to generate diverse and realistic noise and enforces the model to be consistent with respect to these noise.
  • The authors hope that UDA will encourage future research to transfer advanced supervised augmentation to semi-supervised setting for different tasks
Tables
  • Table1: Error rates on CIFAR-10
  • Table2: Error rate on Yelp-5
  • Table3: Comparison between methods using different models where PyramidNet is used with ShakeDrop regularization. On CIFAR-10, with only 4,000 labeled examples, UDA matches the performance of fully supervised Wide-ResNet-28-2 and PyramidNet+ShakeDrop, where they have an error rate of 5.4 and 2.7 respectively when trained on 50,000 examples without RandAugment. On SVHN, UDA also matches the performance of our fully supervised model trained on 73,257 examples without RandAugment, which has an error rate of 2.84
  • Table4: Error rates on text classification datasets. In the fully supervised settings, the pre-BERT SOTAs include ULMFiT (<a class="ref-link" id="cHoward_2018_a" href="#rHoward_2018_a">Howard & Ruder, 2018</a>) for Yelp-2 and Yelp-5, DPCNN (<a class="ref-link" id="cJohnson_2017_a" href="#rJohnson_2017_a">Johnson & Zhang, 2017</a>) for Amazon-2 and Amazon-5, Mixed VAT (<a class="ref-link" id="cSachan_et+al_2018_a" href="#rSachan_et+al_2018_a">Sachan et al, 2018</a>) for IMDb and DBPedia. All of our experiments use a sequence length of 512
  • Table5: Top-1 / top-5 accuracy on ImageNet with 10% and 100% of the labeled set. We use image size 224 and 331 for the 10% and 100% experiments respectively
  • Table6: Error rate (%) for CIFAR-10 with different amounts of labeled data and unlabeled data
  • Table7: Error rate (%) for SVHN with different amounts of labeled data and unlabeled data
  • Table8: Ablation study for Training Signal Annealing (TSA) on Yelp-5 and CIFAR-10. The shown numbers are error rates
  • Table9: Error rate (%) for CIFAR-10
  • Table10: Error rate (%) for SVHN
Download tables as Excel
Related work
  • Existing works in consistency training does make use of data augmentation (Laine & Aila, 2016; Sajjadi et al, 2016); however, they only apply weak augmentation methods such as random translations and cropping. In parallel to our work, ICT (Verma et al, 2019) and MixMatch (Berthelot et al, 2019) also show improvements for semi-supervised learning. These methods employ mixup (Zhang et al, 2017) on top of simple augmentations such as flipping and cropping; instead, UDA emphasizes on the use of state-of-the-art data augmentations, leading to significantly better results on CIFAR-10 and SVHN. In addition, UDA is also applicable to language domain and can also scale well to more challenging vision datasets, such as ImageNet.

    Other works in the consistency training family mostly differ in how the noise is defined: Pseudoensemble (Bachman et al, 2014) directly applies Gaussian noise and Dropout noise; VAT (Miyato et al, 2018; 2016) defines the noise by approximating the direction of change in the input space that the model is most sensitive to; Cross-view training (Clark et al, 2018) masks out part of the input data. Apart from enforcing consistency on the input examples and the hidden representations, another line of research enforces consistency on the model parameter space. Works in this category include Mean Teacher (Tarvainen & Valpola, 2017), fast-Stochastic Weight Averaging (Athiwaratkun et al, 2018) and Smooth Neighbors on Teacher Graphs (Luo et al, 2018). For a complete version of related work, please refer to Appendix D.
Reference
  • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon Wilson. There are many consistent explanations of unlabeled data: Why you should average. 2018.
    Google ScholarFindings
  • Philip Bachman, Ouais Alsharif, and Doina Precup. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems, pp. 3365–3373, 2014.
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249, 2019.
    Findings
  • Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019.
    Findings
  • Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542, 2009.
    Google ScholarLocate open access versionFindings
  • Francois Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
    Google ScholarLocate open access versionFindings
  • Kevin Clark, Minh-Thang Luong, Christopher D Manning, and Quoc V Le. Semi-supervised sequence modeling with cross-view training. arXiv preprint arXiv:1809.08370, 2018.
    Findings
  • Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pp. 160–167. ACM, 2008.
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
    Findings
  • Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical data augmentation with no separate search. arXiv preprint arXiv:1909.13719, 2019.
    Findings
  • Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In Advances in neural information processing systems, pp. 3079–3087, 2015.
    Google ScholarLocate open access versionFindings
  • Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Ruslan R Salakhutdinov. Good semisupervised learning that requires a bad gan. In Advances in Neural Information Processing Systems, pp. 6510–6520, 2017.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    Findings
  • Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. Understanding back-translation at scale. arXiv preprint arXiv:1808.09381, 2018.
    Findings
  • Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pp. 529–536, 2005.
    Google ScholarLocate open access versionFindings
  • Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.
    Findings
  • Ryuichiro Hataya and Hideki Nakayama. Unifying semi-supervised and robust learning by mixup. ICLR The 2nd Learning from Limited Labeled Data (LLD) Workshop, 2019.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Xuanli He, Gholamreza Haffari, and Mohammad Norouzi. Sequence to sequence mixture model for diverse machine translation. arXiv preprint arXiv:1810.07391, 2018.
    Findings
  • Olivier J Henaff, Ali Razavi, Carl Doersch, SM Eslami, and Aaron van den Oord. Data-efficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272, 2019.
    Findings
  • Alex Hernandez-Garcıa and Peter Konig. Data augmentation instead of explicit regularization. arXiv preprint arXiv:1806.03852, 2018.
    Findings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    Findings
  • Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pp. 328–339, 2018.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, and Masashi Sugiyama. Learning discrete representations via information maximizing self-augmented training. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1558–1567. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Jacob Jackson and John Schulman. Semi-supervised learning by label gradient alignment. arXiv preprint arXiv:1902.02336, 2019.
    Findings
  • Rie Johnson and Tong Zhang. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pp. 562–570, 2017.
    Google ScholarLocate open access versionFindings
  • Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised learning with deep generative models. In Advances in neural information processing systems, pp. 3581–3589, 2014.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
    Findings
  • Wouter Kool, Herke van Hoof, and Max Welling. Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. arXiv preprint arXiv:1903.06059, 2019.
    Findings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
    Findings
  • Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, pp. 2, 2013.
    Google ScholarLocate open access versionFindings
  • Davis Liang, Zhiheng Huang, and Zachary C Lipton. Learning noise-invariant representations for robust speech recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 56–63. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Yucen Luo, Jun Zhu, Mengxi Li, Yong Ren, and Bo Zhang. Smooth neighbors on teacher graphs for semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8896–8905, 2018.
    Google ScholarLocate open access versionFindings
  • Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016.
    Findings
  • Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pp. 142–150. Association for Computational Linguistics, 2011.
    Google ScholarLocate open access versionFindings
  • Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Andrew M Dai, and Ian Goodfellow. Adversarial training methods for semisupervised text classification. arXiv preprint arXiv:1605.07725, 2016.
    Findings
  • Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori Koyama. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data. arXiv preprint arXiv:1905.13021, 2019.
    Findings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Avital Oliver, Augustus Odena, Colin A Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems, pp. 3235–3246, 2018.
    Google ScholarLocate open access versionFindings
  • Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019.
    Findings
  • Sungrae Park, JunKeon Park, Su-Jin Shin, and Il-Chul Moon. Adversarial dropout for supervised and semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
    Google ScholarLocate open access versionFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.
    Findings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. URL https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/languageunsupervised/language understanding paper.pdf, 2018.
    Findings
  • Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semisupervised learning with ladder networks. In Advances in neural information processing systems, pp. 3546–3554, 2015.
    Google ScholarLocate open access versionFindings
  • Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. Revisiting lstm networks for semi-supervised text classification via mixed objective function. 2018.
    Google ScholarFindings
  • Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, pp. 1163–1171, 2016.
    Google ScholarLocate open access versionFindings
  • Julian Salazar, Davis Liang, Zhiheng Huang, and Zachary C Lipton. Invariant representation learning for robust deep networks. In Workshop on Integration of Deep Learning Theories, NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242, 2016.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709, 2015.
    Findings
  • Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. Mixture models for diverse machine translation: Tricks of the trade. arXiv preprint arXiv:1902.07816, 2019.
    Findings
  • Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.
    Findings
  • Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems, pp. 1195–1204, 2017.
    Google ScholarLocate open access versionFindings
  • Trieu H Trinh, Minh-Thang Luong, and Quoc V Le. Selfie: Self-supervised pretraining for image embedding. arXiv preprint arXiv:1906.02940, 2019.
    Findings
  • Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019.
    Findings
  • Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. Switchout: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512, 2018.
    Findings
  • Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861, 2016.
    Findings
  • Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, and William W Cohen. Semi-supervised qa with generative domain-adaptive nets. arXiv preprint arXiv:1702.02206, 2017.
    Findings
  • Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6210–6219, 2019.
    Google ScholarLocate open access versionFindings
  • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541, 2018.
    Findings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
    Findings
  • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, and Liwei Wang. Adversarially robust generalization just requires more unlabeled data. arXiv preprint arXiv:1906.00555, 2019a.
    Findings
  • Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4l: Self-supervised semisupervised learning. In Proceedings of the IEEE international conference on computer vision, 2019b.
    Google ScholarLocate open access versionFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
    Findings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657, 2015.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pp. 912–919, 2003.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments