Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks

international conference on learning representations, 2015.

Cited by: 464|Bibtex|Views72|Links
EI
Keywords:
unknown labelgenerative modeldiscriminative classifiermaximum margin clusteringconditional generative adversarialMore(10+)
Weibo:
In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data

Abstract:

In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual information between observed examples and their predicted categorical class distribution, against robustness of the classifier to an adversarial genera...More

Code:

Data:

0
Introduction
  • Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning.
  • K} denotes the unknown label
  • By utilizing both labeled and unlabeled examples from the data distribution one hopes to learn a representation that captures this shared structure.
  • Such a representation might, subsequently, help classifiers trained using only a few labeled examples to generalize to parts of the data distribution that it would otherwise have no information about.
  • Unsupervised categorization of data is an often sought-after tool for discovering groups in datasets with unknown class structure
Highlights
  • Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning
  • We trained an unsupervised categorical generative adversarial networks on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3
  • As an additional quantitative evaluation we compared the unsupervised categorical generative adversarial networks model trained on MNIST with other generative models based on the log likelihood of generated samples
  • In brief: The categorical generative adversarial networks model performs comparable to the best existing algorithms, achieving a log-likelihood of 237 ± 6 on MNIST; in comparison, Goodfellow et al (2014) report 225 ± 2 for generative adversarial networks. That this does not necessarily mean that the categorical generative adversarial networks model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)
  • We have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning
  • We found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity
Results
  • EVALUATION OF THE GENERATIVE MODEL

    the authors qualitatively evaluate the capabilities of the generative model.
  • EVALUATION OF THE GENERATIVE MODEL.
  • The authors qualitatively evaluate the capabilities of the generative model.
  • The authors trained an unsupervised CatGAN on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3.
  • As an additional quantitative evaluation the authors compared the unsupervised CatGAN model trained on MNIST with other generative models based on the log likelihood of generated samples.
  • That this does not necessarily mean that the CatGAN model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)
Conclusion
  • The authors have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning.
  • The authors' method combines neural network classifiers with an adversarial generative model that regularizes a discriminatively trained classifier.
  • The authors found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity
Summary
  • Introduction:

    Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning.
  • K} denotes the unknown label
  • By utilizing both labeled and unlabeled examples from the data distribution one hopes to learn a representation that captures this shared structure.
  • Such a representation might, subsequently, help classifiers trained using only a few labeled examples to generalize to parts of the data distribution that it would otherwise have no information about.
  • Unsupervised categorization of data is an often sought-after tool for discovering groups in datasets with unknown class structure
  • Results:

    EVALUATION OF THE GENERATIVE MODEL

    the authors qualitatively evaluate the capabilities of the generative model.
  • EVALUATION OF THE GENERATIVE MODEL.
  • The authors qualitatively evaluate the capabilities of the generative model.
  • The authors trained an unsupervised CatGAN on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3.
  • As an additional quantitative evaluation the authors compared the unsupervised CatGAN model trained on MNIST with other generative models based on the log likelihood of generated samples.
  • That this does not necessarily mean that the CatGAN model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)
  • Conclusion:

    The authors have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning.
  • The authors' method combines neural network classifiers with an adversarial generative model that regularizes a discriminatively trained classifier.
  • The authors found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity
Tables
  • Table1: Classification error, in percent, for the permutation invariant MNIST problem with a reduced number of labels. Results are averaged over 10 different sets of labeled examples
  • Table2: Classification error, in percent, for different learning methods in combination with convolutional neural networks (CNNs) with a reduced number of labels
  • Table3: Classification error for different methods on the CIFAR-10 dataset (without data augmentation) for the full dataset and a reduced set of 400 labeled examples per class
  • Table4: The discriminator and generator CNNs used for MNIST
  • Table5: The discriminator and generator CNNs used for CIFAR-10
  • Table6: Comparison between different generative models on MNIST
Download tables as Excel
Funding
  • This work was funded by the the German Research Foundation (DFG) within the priority program “Autonomous learning” (SPP1597)
Reference
  • Bachman, Phil, Alsharif, Ouais, and Precup, Doina. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems (NIPS) 27, pp. 3365–3373. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Bastien, Frederic, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian J., Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.
    Google ScholarLocate open access versionFindings
  • Bengio, Yoshua, Thibodeau-Laufer, Eric, and Yosinski, Jason. Deep generative stochastic networks trainable by backprop. In Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
    Google ScholarLocate open access versionFindings
  • Bergstra, James, Breuleux, Olivier, Bastien, Frederic, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.
    Google ScholarLocate open access versionFindings
  • Bridle, John S., Heading, Anthony J. R., and MacKay, David J. C. Unsupervised classifiers, mutual information and phantom targets. In Advances in Neural Information Processing Systems (NIPS) 4. MIT Press, 1992.
    Google ScholarLocate open access versionFindings
  • Denton, Emily, Chintala, Soumith, Szlam, Arthur, and Fergus, Rob. Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems (NIPS) 28, 2015.
    Google ScholarLocate open access versionFindings
  • Dieleman, Sander, Schlter, Jan, Raffel, Colin, Olson, Eben, Sønderby, Søren Kaae, Nouri, Daniel, Maturana, Daniel, Thoma, Martin, Battenberg, Eric, Kelly, Jack, Fauw, Jeffrey De, Heilman, Michael, and et al. Lasagne: First release., August 2015. URL http://dx.doi.org/10.5281/zenodo.27878.
    Findings
  • Dosovitskiy, A., Springenberg, J. T., and Brox, T. Learning to generate chairs with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • Dosovitskiy, Alexey, Springenberg, Jost Tobias, Riedmiller, Martin, and Brox, Thomas. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Ester, Martin, Kriegel, Hans-Peter, Sander, Jrg, and Xu, Xiaowei. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of 2nd International Conference on Knowledge Discovery and Data Mining (KDD), 1996.
    Google ScholarLocate open access versionFindings
  • Fei-Fei, L., Fergus, R., and Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis Machine Intelligence, 28:594–611, April 2006.
    Google ScholarLocate open access versionFindings
  • Funk, Simon. SMORMS3 - blog entry: RMSprop loses to SMORMS3 - beware the epsilon! http://sifter.org/simon/journal/20150420.html, 2015.
    Findings
  • Gauthier, Jon. Conditional generative adversarial networks for face generation. Class Project for Stanford CS231N, 2014.
    Google ScholarFindings
  • Goodfellow, Ian, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Multi-prediction deep boltzmann machines. In Advances in Neural Information Processing Systems (NIPS) 26. Curran Associates, Inc., 2013.
    Google ScholarLocate open access versionFindings
  • Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Grandvalet, Yves and Bengio, Yoshua. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems (NIPS) 17. MIT Press, 2005.
    Google ScholarLocate open access versionFindings
  • Hinton, G E and Salakhutdinov, R R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.
    Google ScholarLocate open access versionFindings
  • Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan R. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580v3, 2012. URL http://arxiv.org/abs/1207.0580v3.
    Findings
  • Huang, Gary B., Ramesh, Manu, Berg, Tamara, and Learned-Miller, Erik. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
    Google ScholarFindings
  • Hui, Ka Y. Direct modeling of complex invariances for visual object features. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR Workshop and Conference Proceedings, 2013.
    Google ScholarLocate open access versionFindings
  • Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML). JMLR Proceedings, 2015.
    Google ScholarLocate open access versionFindings
  • Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Kingma, Diederik P, Mohamed, Shakir, Jimenez Rezende, Danilo, and Welling, Max. Semisupervised learning with deep generative models. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Krause, Andreas, Perona, Pietro, and Gomes, Ryan G. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems (NIPS) 23. MIT Press, 2010.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
    Google ScholarFindings
  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541– 551, 1989.
    Google ScholarLocate open access versionFindings
  • Lee, Dong-Hyun. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, 2013.
    Google ScholarLocate open access versionFindings
  • Li, Yujia, Swersky, Kevin, and Zemel, Richard S. Generative moment matching networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
    Google ScholarLocate open access versionFindings
  • Mirza, Mehdi and Osindero, Simon. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014. URL http://arxiv.org/abs/1411.1784.
    Findings
  • Osendorfer, Christian, Soyer, Hubert, and van der Smagt, Patrick. Image super-resolution with fast approximate convolutional sparse coding. In ICONIP, Lecture Notes in Computer Science. Springer International Publishing, 2014.
    Google ScholarLocate open access versionFindings
  • Rasmus, Antti, Valpola, Harri, Honkala, Mikko, Berglund, Mathias, and Raiko, Tapani. Semisupervised learning with ladder network. In Advances in Neural Information Processing Systems (NIPS) 28, 2015.
    Google ScholarLocate open access versionFindings
  • Rifai, Salah, Dauphin, Yann N, Vincent, Pascal, Bengio, Yoshua, and Muller, Xavier. The manifold tangent classifier. In Advances in Neural Information Processing Systems (NIPS) 24. Curran Associates, Inc., 2011.
    Google ScholarLocate open access versionFindings
  • Salakhutdinov, Ruslan and Hinton, Geoffrey. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
    Google ScholarLocate open access versionFindings
  • Schaul, Tom, Zhang, Sixin, and LeCun, Yann. No More Pesky Learning Rates. In International Conference on Machine Learning (ICML), 2013.
    Google ScholarLocate open access versionFindings
  • Springenberg, Jost Tobias, Dosovitskiy, Alexey, Brox, Thomas, and Riedmiller, Martin. Striving for simplicity: The all convolutional net. In arXiv:1412.6806, 2015.
    Findings
  • Theis, Lucas, van den Oord, Aaron, and Bethge, Matthias. A note on the evaluation of generative models. CoRR, abs/1511.01844, 2015. URL http://arxiv.org/abs/1511.01844.
    Findings
  • Tieleman, T. and Hinton, G. Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
    Google ScholarLocate open access versionFindings
  • Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Extracting and composing robust features with denoising autoencoders. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML), 2008.
    Google ScholarLocate open access versionFindings
  • Weston, J., Ratle, F., Mobahi, H., and Collobert, R. Deep learning via semi-supervised embedding. In Montavon, G., Orr, G., and Muller, K-R. (eds.), Neural Networks: Tricks of the Trade. Springer, 2012.
    Google ScholarLocate open access versionFindings
  • Xu, Linli, Neufeld, James, Larson, Bryce, and Schuurmans, Dale. Maximum margin clustering. In Advances in Neural Information Processing Systems (NIPS) 17. MIT Press, 2005.
    Google ScholarLocate open access versionFindings
  • Zeiler, Matthew D., Taylor, Graham W., and Fergus, Rob. Adaptive deconvolutional networks for mid and high level feature learning. In IEEE International Conference on Computer Vision, ICCV, pp. 2018–2025, 2011.
    Google ScholarLocate open access versionFindings
  • Zhao, Junbo, Mathieu, Michael, Goroshin, Ross, and Lecun, Yann. Stacked what-where autoencoders. CoRR, abs/1506.02351, 2015. URL http://arxiv.org/abs/1506.02351.
    Findings
Your rating :
0

 

Tags
Comments