Learning by Association - A versatile semi-supervised training method for neural networks

Philip Häusser
Philip Häusser
Alexander Mordvintsev
Alexander Mordvintsev

CVPR, 2017.

Cited by: 63|Bibtex|Views102|Links
EI
Keywords:
supervised training schemeavailable unlabeled datumStreet View House Numbersunsupervised feature learningdeep generative modelMore(8+)
Weibo:
We have proposed a novel semi-supervised training scheme that is fully differentiable and easy to add to existing end-to-end settings

Abstract:

In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. Asso...More

Code:

Data:

0
Introduction
  • A child is able to learn new concepts quickly and without the need for millions examples that are pointed out individually.
  • In terms of training computers to perform similar tasks, deep neural networks have demonstrated superior performance among machine learning models ([20, 18, 10]).
  • These networks have been trained dramatically differently from a learning child, requiring labels for every training example, following a purely supervised training scheme.
  • It is desirable to train machine learning models without labels or with only some fraction of the data labeled
Highlights
  • A child is able to learn new concepts quickly and without the need for millions examples that are pointed out individually
  • Neural networks are defined by huge amounts of parameters to be optimized
  • The testing protocol suggested by the data set creators, we do not want to claim state of the art for this experiment but do consider it a promising result. [13] achieved 76.3% following the proposed protocol
  • A meerkat looking to the right is associated with a dog looking in the same direction or with a racoon with dark spots around the eyes
  • We have demonstrated how adding unlabeled data improves results dramatically, in particular when the number of labeled samples is small, surpassing state of the art for Street View House Numbers with 500 labeled samples
  • We have proposed a novel semi-supervised training scheme that is fully differentiable and easy to add to existing end-to-end settings
Methods
  • Figure 4 shows the 5 nearest neighbors for samples from the unlabeled training set.
  • The cosine similarity is shown in the top left corner of each association.
  • Note that these numbers are not softmaxed.
  • Embeddings of classes not present in the labeled training set do not seem to group together well; rather, they tend to be close to known class
Results
  • For cases with few labeled data, the training scheme outperforms the current state of the art on SVHN.
  • Extensive experiments demonstrating that the proposed method improves performance by up to 64% compared to the purely supervised case
Conclusion
  • The authors have proposed a novel semi-supervised training scheme that is fully differentiable and easy to add to existing end-to-end settings.
  • The key idea is to encourage cycle-consistent association chains from embeddings of la-
Summary
  • Introduction:

    A child is able to learn new concepts quickly and without the need for millions examples that are pointed out individually.
  • In terms of training computers to perform similar tasks, deep neural networks have demonstrated superior performance among machine learning models ([20, 18, 10]).
  • These networks have been trained dramatically differently from a learning child, requiring labels for every training example, following a purely supervised training scheme.
  • It is desirable to train machine learning models without labels or with only some fraction of the data labeled
  • Methods:

    Figure 4 shows the 5 nearest neighbors for samples from the unlabeled training set.
  • The cosine similarity is shown in the top left corner of each association.
  • Note that these numbers are not softmaxed.
  • Embeddings of classes not present in the labeled training set do not seem to group together well; rather, they tend to be close to known class
  • Results:

    For cases with few labeled data, the training scheme outperforms the current state of the art on SVHN.
  • Extensive experiments demonstrating that the proposed method improves performance by up to 64% compared to the purely supervised case
  • Conclusion:

    The authors have proposed a novel semi-supervised training scheme that is fully differentiable and easy to add to existing end-to-end settings.
  • The key idea is to encourage cycle-consistent association chains from embeddings of la-
Tables
  • Table1: Results on MNIST. Error (%) on the test set (lower is better). Standard deviations in parentheses. †: Results on permutationinvariant MNIST
  • Table2: Results of comparable methods on SVHN. Error (%) on the test set (lower is better). Standard deviations in parentheses. *) Results provided by authors
  • Table3: Results on SVHN with different amounts of (total) labeled/unlabeled training data. Error (%) on the test set (lower is better). Standard deviations in parentheses
  • Table4: Effect of visit loss. Error (%) on the resp. test sets (lower is better) for different values of visit loss weight. Reported are the medians of the minimum error rates throughout training with standard deviation in parentheses. Experiments were run with 1,000 randomly chosen labeled samples as supervised data set
  • Table5: Domain adaptation. Errors (%) on the target test sets (lower is better). “Source only” and “target only” refers to training only on the respective data set without domain adaptation. “DA” and “DS” stand for Domain-Adversarial Training and Domain Separation Networks, resp. The numbers in parentheses indicate how much of the gap between lower and upper bounds was covered
Download tables as Excel
Related work
  • The challenge of harnessing unlabeled data for training of neural networks has been tackled using a variety of different methods. Although this work follows a semi-supervised approach, it is in its motivation also related to purely unsupervised methods. A third category of related work is constituted by generative approaches.

    2.1. Semi-supervised training

    The semi-supervised training paradigm has not been among the most popular methods for neural networks in the past. It has been successfully applied to SVMs [14] where unlabeled samples serve as additional regularizers in that decision boundaries are required to have a broad margin also to unlabeled samples.

    One training scheme applicable to neural nets is to bootstrap the model with additional labeled data obtained from the model’s own predictions. [22] introduce pseudo-labels for unlabeled samples which are simply the class with the maximum predicted probability. Labeled and unlabeled samples are then trained on simultaneously. In combination with a denoising auto-encoder and dropout, this approach yields competitive results on MNIST.
Reference
  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016. 4
    Findings
  • K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. arXiv preprint arXiv:1608.06019, 2016. 8
    Findings
  • D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015. 4
    Findings
  • A. Coates, H. Lee, and A. Y. Ng. An analysis of singlelayer networks in unsupervised feature learning. Ann Arbor, 1001(48109):2, 2010. 4
    Google ScholarLocate open access versionFindings
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011. 7
    Google ScholarLocate open access versionFindings
  • C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 1422–1430, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems, pages 766–774, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016. 8
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015. 1, 8
    Findings
  • I. Higgins, L. Matthey, X. Glorot, A. Pal, B. Uria, C. Blundell, S. Mohamed, and A. Lerchner. Early visual concept learning with unsupervised deep learning. arXiv preprint arXiv:1606.05579, 2016. 2
    Findings
  • G. Hinton. A practical guide to training restricted boltzmann machines. Momentum, 9(1):926, 2010. 2
    Google ScholarLocate open access versionFindings
  • C. Huang, C. Change Loy, and X. Tang. Unsupervised learning of discriminative attributes and visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5175–5184, 2016. 1, 2, 5
    Google ScholarLocate open access versionFindings
  • T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200–209, 1999. 2
    Google ScholarLocate open access versionFindings
  • Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014. 7
    Findings
  • D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 4
    Findings
  • D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling. Semi-supervised learning with deep generative models. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3581–3589. Curran Associates, Inc., 2014. 7
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012. 1
    Google ScholarLocate open access versionFindings
  • Q. V. Le. Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8595–8598. IEEE, 2013. 2
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 1
    Google ScholarLocate open access versionFindings
  • Y. LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits, 1998. 4
    Google ScholarFindings
  • D.-H. Lee. Pseudo-label: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013. 1, 2
    Google ScholarLocate open access versionFindings
  • L. Maaløe, C. K. Sønderby, S. K. Sønderby, and O. Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016. 7
    Findings
  • T. Miyato, S.-i. Maeda, M. Koyama, K. Nakae, and S. Ishii. Distributional smoothing by virtual adversarial examples. arXiv preprint arXiv:1507.00677, 2015. 7
    Findings
  • Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 4. Granada, Spain, 2011. 6
    Google ScholarLocate open access versionFindings
  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. 3
    Findings
  • M. Ranzato and M. Szummer. Semi-supervised learning of compact document representations with deep networks. In Proceedings of the 25th international conference on Machine learning, pages 792–799. ACM, 2008. 2
    Google ScholarLocate open access versionFindings
  • A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pages 3546–3554, 2015. 5
    Google ScholarLocate open access versionFindings
  • K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010. 7
    Google ScholarLocate open access versionFindings
  • M. Sajjadi, M. Javanmardi, and T. Tasdizen. Mutual exclusivity loss for semi-supervised deep learning. In 2016 IEEE International Conference on Image Processing (ICIP), pages 1908–1912. IEEE, 2016. 2
    Google ScholarLocate open access versionFindings
  • M. Sajjadi, M. Javanmardi, and T. Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. arXiv preprint arXiv:1606.04586, 2016. 2, 4, 5, 7
    Findings
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016. 3, 5, 7
    Findings
  • P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Technical report, DTIC Document, 1986. 2
    Google ScholarFindings
  • N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. CoRR, abs/1502.04681, 2, 2015. 2
    Findings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015. 8
    Google ScholarLocate open access versionFindings
  • S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729, 2014. 7
    Findings
  • J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012. 2
    Google ScholarLocate open access versionFindings
  • Z. Yang, R. Salakhutdinov, and W. Cohen. Multi-task cross-lingual sequence tagging from scratch. arXiv preprint arXiv:1603.06270, 2016. 7
    Findings
  • J. Zhao, M. Mathieu, R. Goroshin, and Y. Lecun. Stacked what-where auto-encoders. arXiv preprint arXiv:1506.02351, 2015. 1, 2
    Findings
Your rating :
0

 

Tags
Comments