# Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks

international conference on learning representations, 2015.

EI

Weibo:

Abstract:

In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual information between observed examples and their predicted categorical class distribution, against robustness of the classifier to an adversarial genera...More

Code:

Data:

Introduction

- Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning.
- K} denotes the unknown label
- By utilizing both labeled and unlabeled examples from the data distribution one hopes to learn a representation that captures this shared structure.
- Such a representation might, subsequently, help classifiers trained using only a few labeled examples to generalize to parts of the data distribution that it would otherwise have no information about.
- Unsupervised categorization of data is an often sought-after tool for discovering groups in datasets with unknown class structure

Highlights

- Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning
- We trained an unsupervised categorical generative adversarial networks on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3
- As an additional quantitative evaluation we compared the unsupervised categorical generative adversarial networks model trained on MNIST with other generative models based on the log likelihood of generated samples
- In brief: The categorical generative adversarial networks model performs comparable to the best existing algorithms, achieving a log-likelihood of 237 ± 6 on MNIST; in comparison, Goodfellow et al (2014) report 225 ± 2 for generative adversarial networks. That this does not necessarily mean that the categorical generative adversarial networks model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)
- We have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning
- We found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity

Results

**EVALUATION OF THE GENERATIVE MODEL**

the authors qualitatively evaluate the capabilities of the generative model.**EVALUATION OF THE GENERATIVE MODEL**.- The authors qualitatively evaluate the capabilities of the generative model.
- The authors trained an unsupervised CatGAN on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3.
- As an additional quantitative evaluation the authors compared the unsupervised CatGAN model trained on MNIST with other generative models based on the log likelihood of generated samples.
- That this does not necessarily mean that the CatGAN model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)

Conclusion

- The authors have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning.
- The authors' method combines neural network classifiers with an adversarial generative model that regularizes a discriminatively trained classifier.
- The authors found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity

Summary

## Introduction:

Learning non-linear classifiers from unlabeled or only partially labeled data is a long standing problem in machine learning.- K} denotes the unknown label
- By utilizing both labeled and unlabeled examples from the data distribution one hopes to learn a representation that captures this shared structure.
- Such a representation might, subsequently, help classifiers trained using only a few labeled examples to generalize to parts of the data distribution that it would otherwise have no information about.
- Unsupervised categorization of data is an often sought-after tool for discovering groups in datasets with unknown class structure
## Results:

**EVALUATION OF THE GENERATIVE MODEL**

the authors qualitatively evaluate the capabilities of the generative model.**EVALUATION OF THE GENERATIVE MODEL**.- The authors qualitatively evaluate the capabilities of the generative model.
- The authors trained an unsupervised CatGAN on MNIST, LFW and CIFAR-10 and plot samples generated by these models in Figure 3.
- As an additional quantitative evaluation the authors compared the unsupervised CatGAN model trained on MNIST with other generative models based on the log likelihood of generated samples.
- That this does not necessarily mean that the CatGAN model is superior as comparing generative models with respect to log-likelihood measured by a Parzen-window estimate can be misleading (see Theis et al (2015) for a recent in-depth discussion)
## Conclusion:

The authors have presented categorical generative adversarial networks, a framework for robust unsupervised and semi-supervised learning.- The authors' method combines neural network classifiers with an adversarial generative model that regularizes a discriminatively trained classifier.
- The authors found the proposed method to yield classification performance that is competitive with state-of-the-art results for semi-supervised learning for image classification and further confirmed that the generator, which is learned alongside the classifier, is capable of generating images of high visual fidelity

- Table1: Classification error, in percent, for the permutation invariant MNIST problem with a reduced number of labels. Results are averaged over 10 different sets of labeled examples
- Table2: Classification error, in percent, for different learning methods in combination with convolutional neural networks (CNNs) with a reduced number of labels
- Table3: Classification error for different methods on the CIFAR-10 dataset (without data augmentation) for the full dataset and a reduced set of 400 labeled examples per class
- Table4: The discriminator and generator CNNs used for MNIST
- Table5: The discriminator and generator CNNs used for CIFAR-10
- Table6: Comparison between different generative models on MNIST

Funding

- This work was funded by the the German Research Foundation (DFG) within the priority program “Autonomous learning” (SPP1597)

Reference

- Bachman, Phil, Alsharif, Ouais, and Precup, Doina. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems (NIPS) 27, pp. 3365–3373. Curran Associates, Inc., 2014.
- Bastien, Frederic, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian J., Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.
- Bengio, Yoshua, Thibodeau-Laufer, Eric, and Yosinski, Jason. Deep generative stochastic networks trainable by backprop. In Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
- Bergstra, James, Breuleux, Olivier, Bastien, Frederic, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.
- Bridle, John S., Heading, Anthony J. R., and MacKay, David J. C. Unsupervised classifiers, mutual information and phantom targets. In Advances in Neural Information Processing Systems (NIPS) 4. MIT Press, 1992.
- Denton, Emily, Chintala, Soumith, Szlam, Arthur, and Fergus, Rob. Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems (NIPS) 28, 2015.
- Dieleman, Sander, Schlter, Jan, Raffel, Colin, Olson, Eben, Sønderby, Søren Kaae, Nouri, Daniel, Maturana, Daniel, Thoma, Martin, Battenberg, Eric, Kelly, Jack, Fauw, Jeffrey De, Heilman, Michael, and et al. Lasagne: First release., August 2015. URL http://dx.doi.org/10.5281/zenodo.27878.
- Dosovitskiy, A., Springenberg, J. T., and Brox, T. Learning to generate chairs with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Dosovitskiy, Alexey, Springenberg, Jost Tobias, Riedmiller, Martin, and Brox, Thomas. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
- Ester, Martin, Kriegel, Hans-Peter, Sander, Jrg, and Xu, Xiaowei. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of 2nd International Conference on Knowledge Discovery and Data Mining (KDD), 1996.
- Fei-Fei, L., Fergus, R., and Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis Machine Intelligence, 28:594–611, April 2006.
- Funk, Simon. SMORMS3 - blog entry: RMSprop loses to SMORMS3 - beware the epsilon! http://sifter.org/simon/journal/20150420.html, 2015.
- Gauthier, Jon. Conditional generative adversarial networks for face generation. Class Project for Stanford CS231N, 2014.
- Goodfellow, Ian, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Multi-prediction deep boltzmann machines. In Advances in Neural Information Processing Systems (NIPS) 26. Curran Associates, Inc., 2013.
- Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
- Grandvalet, Yves and Bengio, Yoshua. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems (NIPS) 17. MIT Press, 2005.
- Hinton, G E and Salakhutdinov, R R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.
- Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan R. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580v3, 2012. URL http://arxiv.org/abs/1207.0580v3.
- Huang, Gary B., Ramesh, Manu, Berg, Tamara, and Learned-Miller, Erik. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
- Hui, Ka Y. Direct modeling of complex invariances for visual object features. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR Workshop and Conference Proceedings, 2013.
- Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML). JMLR Proceedings, 2015.
- Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- Kingma, Diederik P, Mohamed, Shakir, Jimenez Rezende, Danilo, and Welling, Max. Semisupervised learning with deep generative models. In Advances in Neural Information Processing Systems (NIPS) 27. Curran Associates, Inc., 2014.
- Krause, Andreas, Perona, Pietro, and Gomes, Ryan G. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems (NIPS) 23. MIT Press, 2010.
- Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541– 551, 1989.
- Lee, Dong-Hyun. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, 2013.
- Li, Yujia, Swersky, Kevin, and Zemel, Richard S. Generative moment matching networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
- Mirza, Mehdi and Osindero, Simon. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014. URL http://arxiv.org/abs/1411.1784.
- Osendorfer, Christian, Soyer, Hubert, and van der Smagt, Patrick. Image super-resolution with fast approximate convolutional sparse coding. In ICONIP, Lecture Notes in Computer Science. Springer International Publishing, 2014.
- Rasmus, Antti, Valpola, Harri, Honkala, Mikko, Berglund, Mathias, and Raiko, Tapani. Semisupervised learning with ladder network. In Advances in Neural Information Processing Systems (NIPS) 28, 2015.
- Rifai, Salah, Dauphin, Yann N, Vincent, Pascal, Bengio, Yoshua, and Muller, Xavier. The manifold tangent classifier. In Advances in Neural Information Processing Systems (NIPS) 24. Curran Associates, Inc., 2011.
- Salakhutdinov, Ruslan and Hinton, Geoffrey. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
- Schaul, Tom, Zhang, Sixin, and LeCun, Yann. No More Pesky Learning Rates. In International Conference on Machine Learning (ICML), 2013.
- Springenberg, Jost Tobias, Dosovitskiy, Alexey, Brox, Thomas, and Riedmiller, Martin. Striving for simplicity: The all convolutional net. In arXiv:1412.6806, 2015.
- Theis, Lucas, van den Oord, Aaron, and Bethge, Matthias. A note on the evaluation of generative models. CoRR, abs/1511.01844, 2015. URL http://arxiv.org/abs/1511.01844.
- Tieleman, T. and Hinton, G. Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
- Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Extracting and composing robust features with denoising autoencoders. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML), 2008.
- Weston, J., Ratle, F., Mobahi, H., and Collobert, R. Deep learning via semi-supervised embedding. In Montavon, G., Orr, G., and Muller, K-R. (eds.), Neural Networks: Tricks of the Trade. Springer, 2012.
- Xu, Linli, Neufeld, James, Larson, Bryce, and Schuurmans, Dale. Maximum margin clustering. In Advances in Neural Information Processing Systems (NIPS) 17. MIT Press, 2005.
- Zeiler, Matthew D., Taylor, Graham W., and Fergus, Rob. Adaptive deconvolutional networks for mid and high level feature learning. In IEEE International Conference on Computer Vision, ICCV, pp. 2018–2025, 2011.
- Zhao, Junbo, Mathieu, Michael, Goroshin, Ross, and Lecun, Yann. Stacked what-where autoencoders. CoRR, abs/1506.02351, 2015. URL http://arxiv.org/abs/1506.02351.

Tags

Comments