Interpolation Consistency Training for Semi-Supervised Learning

IJCAI, pp. 3635-3641, 2019.

Cited by: 73|Bibtex|Views406|Links
EI
Keywords:
deep learningunlabeled datuminterpolation consistency trainingdensity regionlow density regionMore(9+)
Weibo:
Machine learning is having a transformative impact on diverse areas, yet its application is often limited by the amount of available labeled data

Abstract:

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification ...More

Code:

Data:

0
Introduction
  • Deep learning achieves excellent performance in supervised learning tasks where labeled data is abundant (LeCun et al, 2015).
  • The existence of cluster structures in the input distribution could hint the separation of samples into different labels.
  • This is often called the cluster assumption: if two samples belong to the same cluster in the input distribution, they are likely to belong to the same class.
  • The low-density separation assumption has inspired many recent consistency-regularization semi-supervised learning
Highlights
  • Deep learning achieves excellent performance in supervised learning tasks where labeled data is abundant (LeCun et al, 2015)
  • This is often called the cluster assumption: if two samples belong to the same cluster in the input distribution, they are likely to belong to the same class
  • The cluster assumption is equivalent to the low-density separation assumption: the decision boundary should lie in the low-density regions
  • The equivalence is easy to infer: A decision boundary which lies in a high-density region, will cut a cluster into two different classes, requiring that samples from different classes lie in the same cluster; which is the violation of the cluster assumption
  • Machine learning is having a transformative impact on diverse areas, yet its application is often limited by the amount of available labeled data
  • We have proposed a simple but efficient semi-supervised learning algorithm, Interpolation Consistency Training(ICT), which has two advantages over previous approaches to semi-supervised learning
Methods
  • 3.1 Datasets

    The authors follow the common practice in semi-supervised learning literature (Laine & Aila, 2016; Miyato et al, 2018; Tarvainen & Valpola, 2017; Park et al, 2018; Luo et al, 2018) and conduct experiments using the CIFAR-10 and SVHN datasets, where only a fraction of the training data is labeled, and the remaining data is used as unlabeled data.
  • The CIFAR-10 dataset consists of 60000 color images each of size 32 × 32, split between 50K training and 10K test images.
  • This dataset has ten classes, which include images of natural objects such as cars, horses, airplanes and deer.
  • Each example is a close-up image of a house number
Results
  • The authors provide the results for CIFAR10 and SVHN datasets using CNN-13 architecture in the Table 1 and Table 2, respectively.
  • To justify the use of a SSL algorithm, one must compare its performance against the state-of-the-art supervised learning algorithm (Oliver et al, 2018).
  • To this end, the authors compare the method against two state-of-the-art supervised learning algorithms (Zhang et al, 2018; Verma et al, 2018), denoted as Supervised(Mixup) and Supervised(Manifold Mixup), respectively in Table 1 and 2.
  • ICT method passes this test with a wide margin, often resulting in a two-fold reduction in the test error in the case of CIFAR10 (Table 1) and a four-fold reduction in the case of SVHN (Table 2)
Conclusion
  • Machine learning is having a transformative impact on diverse areas, yet its application is often limited by the amount of available labeled data.
  • Progress in semi-supervised learning techniques holds promise for those applications where labels are expensive to obtain.
  • The authors have proposed a simple but efficient semi-supervised learning algorithm, Interpolation Consistency Training(ICT), which has two advantages over previous approaches to semi-supervised learning.
  • It uses almost no additional computation, as opposed to computing adversarial perturbations or training generative models.
  • Another direction for future work is to better understand the theoretical properties of interpolation-based regularizers in the SSL paradigm
Summary
  • Introduction:

    Deep learning achieves excellent performance in supervised learning tasks where labeled data is abundant (LeCun et al, 2015).
  • The existence of cluster structures in the input distribution could hint the separation of samples into different labels.
  • This is often called the cluster assumption: if two samples belong to the same cluster in the input distribution, they are likely to belong to the same class.
  • The low-density separation assumption has inspired many recent consistency-regularization semi-supervised learning
  • Methods:

    3.1 Datasets

    The authors follow the common practice in semi-supervised learning literature (Laine & Aila, 2016; Miyato et al, 2018; Tarvainen & Valpola, 2017; Park et al, 2018; Luo et al, 2018) and conduct experiments using the CIFAR-10 and SVHN datasets, where only a fraction of the training data is labeled, and the remaining data is used as unlabeled data.
  • The CIFAR-10 dataset consists of 60000 color images each of size 32 × 32, split between 50K training and 10K test images.
  • This dataset has ten classes, which include images of natural objects such as cars, horses, airplanes and deer.
  • Each example is a close-up image of a house number
  • Results:

    The authors provide the results for CIFAR10 and SVHN datasets using CNN-13 architecture in the Table 1 and Table 2, respectively.
  • To justify the use of a SSL algorithm, one must compare its performance against the state-of-the-art supervised learning algorithm (Oliver et al, 2018).
  • To this end, the authors compare the method against two state-of-the-art supervised learning algorithms (Zhang et al, 2018; Verma et al, 2018), denoted as Supervised(Mixup) and Supervised(Manifold Mixup), respectively in Table 1 and 2.
  • ICT method passes this test with a wide margin, often resulting in a two-fold reduction in the test error in the case of CIFAR10 (Table 1) and a four-fold reduction in the case of SVHN (Table 2)
  • Conclusion:

    Machine learning is having a transformative impact on diverse areas, yet its application is often limited by the amount of available labeled data.
  • Progress in semi-supervised learning techniques holds promise for those applications where labels are expensive to obtain.
  • The authors have proposed a simple but efficient semi-supervised learning algorithm, Interpolation Consistency Training(ICT), which has two advantages over previous approaches to semi-supervised learning.
  • It uses almost no additional computation, as opposed to computing adversarial perturbations or training generative models.
  • Another direction for future work is to better understand the theoretical properties of interpolation-based regularizers in the SSL paradigm
Tables
  • Table1: Error rates (%) on CIFAR-10 using CNN-13 architecture. We ran three trials for ICT
  • Table2: Error rates (%) on SVHN using CNN-13 architecture. We ran three trials for ICT
  • Table3: Results on CIFAR10 (4000 labels) and SVHN (1000 labels) (in test error %). All results use the same standardized architecture (WideResNet-28-2). Each experiment was run for three trials. † refers to the results reported in (<a class="ref-link" id="cOliver_et+al_2018_a" href="#rOliver_et+al_2018_a">Oliver et al, 2018</a>). We did not conduct any hyperparameter search and used the best hyperparameters found in the experiments of Table 1 and 2 for CIFAR10(4000 labels) and SVHN(1000 labels)
Download tables as Excel
Related work
  • This work builds on two threads of research: consistency-regularization for semi-supervised learning and interpolation-based regularizers.

    On the one hand, consistency-regularization semi-supervised learning methods (Sajjadi et al, 2016; Laine & Aila, 2016; Tarvainen & Valpola, 2017; Miyato et al, 2018; Luo et al, 2018; Athiwaratkun et al, 2019) encourage that realistic perturbations u + δ of unlabeled samples u should not change the model predictions fθ(u). These methods are motivated by the low-density separation assumption (Chapelle et al, 2010), and as such push the decision boundary to lie in the low-density regions of the input space, achieving larger classification margins. ICT differs from these approaches in two aspects. First, ICT chooses perturbations in the direction of another randomly chosen unlabeled sample, avoiding expensive gradient computations. When interpolating between distant points, the regularization effect of ICT applies to larger regions of the input space.
Funding
  • Vikas Verma was supported by Academy of Finland project 13312683 / Raiko Tapani AT kulut
Reference
  • Athiwaratkun, B., Finzi, M., Izmailov, P., and Wilson, A. G. There are many consistent explanations of unlabeled data: Why you should average. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rkgKBhA5Y7.
    Locate open access versionFindings
  • Berthelot, D., Raffel, C., Roy, A., and Goodfellow, I. Understanding and improving interpolation in autoencoders via an adversarial regularizer. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1fQSiCcYm.
    Locate open access versionFindings
  • Chapelle, O., Schlkopf, B., and Zien, A. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. ISBN 0262514125, 9780262514125.
    Google ScholarFindings
  • Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., and Ha, D. Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
    Findings
  • Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M., and Courville, A. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
    Findings
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Laine, S. and Aila, T. Temporal ensembling for semi-supervised learning. CoRR, abs/1610.02242, 2016. URL http://arxiv.org/abs/1610.02242.
    Findings
  • Lecouat, B., Foo, C.-S., Zenati, H., and Chandrasekhar, V. Manifold regularization with gans for semi-supervised learning. arXiv preprint arXiv:1807.04307, 2018.
    Findings
  • LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. nature, 521(7553):436, 2015.
    Google ScholarLocate open access versionFindings
  • Loshchilov, I. and Hutter, F. SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. URL http://arxiv.org/abs/1608.03983.
    Findings
  • Luo, Y., Zhu, J., Li, M., Ren, Y., and Zhang, B. Smooth neighbors on teacher graphs for semisupervised learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 8896–8905, 2018.
    Google ScholarLocate open access versionFindings
  • Miyato, T., ichi Maeda, S., Koyama, M., and Ishii, S. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Nakkiran, P. Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532, 2019.
    Findings
  • Oliver, A., Odena, A., Raffel, C., Cubuk, E. D., and Goodfellow, I. J. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. In Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Park, S., Park, J., Shin, S.-J., and Moon, I.-C. Adversarial dropout for supervised and semi-supervised learning. AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • Sajjadi, M., Javanmardi, M., and Tasdizen, T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 1171–1179, USA, 2016. Curran Associates Inc. ISBN 978-1-5108-3881-9. URL http://dl.acm.org/citation.cfm?id=3157096.3157227.
    Locate open access versionFindings
  • Shawe-Taylor, J., Bartlett, P., C. Williamson, R., and Anthony, M. A framework for structural risk minimisation. pp. 68–76, 01 1996. doi: 10.1145/238061.238070.
    Findings
  • Tarvainen, A. and Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30, pp. 1195–1204, 2017.
    Google ScholarLocate open access versionFindings
  • Tokozume, Y., Ushiku, Y., and Harada, T. Between-class learning for image classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    Google ScholarLocate open access versionFindings
  • Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. stat, 1050:11, 2018.
    Google ScholarLocate open access versionFindings
  • Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Courville, A., Lopez-Paz, D., and Bengio, Y. Manifold Mixup: Better Representations by Interpolating Hidden States. arXiv e-prints, art. arXiv:1806.05236, Jun 2018.
    Findings
  • Zagoruyko, S. and Komodakis, N. Wide residual networks. In Richard C. Wilson, E. R. H. and Smith, W. A. P. (eds.), Proceedings of the British Machine Vision Conference (BMVC), pp. 87.1– 87.12. BMVA Press, September 2016. ISBN 1-901725-59-6. doi: 10.5244/C.30.87. URL https://dx.doi.org/10.5244/C.30.87.
    Locate open access versionFindings
  • Zhang, H., Cisse, M., Dauphin, Y. N., and Lopez-Paz, D. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1Ddp1-Rb.
    Locate open access versionFindings
Your rating :
0

 

Tags
Comments