# Semi-Supervised Learning via Compact Latent Space Clustering

international conference on machine learning, pp. 2464-2473, 2018.

EI

Weibo:

Abstract:

We present a novel cost function for semi-supervised learning of neural networks that encourages compact clustering of the latent space to facilitate separation. The key idea is to dynamically create a graph over embeddings of labeled and unlabeled samples of a training batch to capture underlying structure in feature space, and use label...More

Code:

Data:

Introduction

- Semi-supervised learning (SSL) addresses the problem of learning a model by effectively leveraging both labeled and unlabeled data (Chapelle et al, 2006).
- SSL is effective when it results in a model that generalizes better than a model learned from labeled data only.
- Let X be a sample space with data points and Y the set of labels.
- Let DL ⊆ X × Y be a set of labeled data points, and let DU ⊆ X be a set of unlabeled data.
- Leveraging the ample unlabeled data allows capturing more faithfully the structure of data

Highlights

- Semi-supervised learning (SSL) addresses the problem of learning a model by effectively leveraging both labeled and unlabeled data (Chapelle et al, 2006)
- We study the dynamics of Compact Clustering via Label Propagation (S = 10) on two-circles (Fig. 2), when a single labeled example is given per class
- Compact Clustering via Label Propagation consistently improves performance over standard supervision even when all labels in the training set are used for supervision, indicating that Compact Clustering via Label Propagation could be used as a latent space regularizer in fully supervised systems
- We emphasize that our method consists of the computation of a single cost function and does not require additional network components, such as the generators required for variational auto-encoders and Generative adversarial networks, or the density estimator PixelCNN++ used in Dai et al (2017)
- We have presented a novel regularization technique for supervised learning, based on the idea of forming compact clusters in the latent space of a neural network while preserving existing clusters during optimization
- We showed that our approach is effective in leveraging unlabeled samples via empirical evaluation on three widely used image classification benchmarks

Methods

- In this work the authors take the labeling function f (x; θ) to be a multi-layer neural network.
- This model can be decomposed into a feature extractor z(x; θz) ∈ Z parametrized by θz, and a classifier g(z(x; θz); θg) with parameters θg.
- The authors argue that classification is improved whenever data from each class form compact, well separated clusters in feature space Z.
- The authors introduce a regularizer (Section 3.2) that 1) encourages compact clustering according to propagated labels and 2) avoids disturbing existing clusters during optimization (Fig. 1)

Results

- Performance of the method in comparison to recent SSL approaches that use similar experimental settings are reported in Table 1.
- CCLP consistently improves performance over standard supervision even when all labels in the training set are used for supervision, indicating that CCLP could be used as a latent space regularizer in fully supervised systems.
- In the latter settings, CCLP offers greater improvement over the corresponding baselines than the most recent perturbation-based method, mean teacher (Tarvainen & Valpola, 2017).
- The compact clustering that the method encourages is orthogonal to previous approaches and could boost their performance further

Conclusion

- The authors have presented a novel regularization technique for SSL, based on the idea of forming compact clusters in the latent space of a neural network while preserving existing clusters during optimization.
- This is enabled by dynamically constructing a graph in latent space at each SGD iteration and propagating labels to estimate the manifold’s structure, which the authors regularize.
- Analyzing further the properties of a compactly clustered latent space, as well as applying the approach to larger benchmarks and the task of semantic segmentation is interesting future work

Summary

## Introduction:

Semi-supervised learning (SSL) addresses the problem of learning a model by effectively leveraging both labeled and unlabeled data (Chapelle et al, 2006).- SSL is effective when it results in a model that generalizes better than a model learned from labeled data only.
- Let X be a sample space with data points and Y the set of labels.
- Let DL ⊆ X × Y be a set of labeled data points, and let DU ⊆ X be a set of unlabeled data.
- Leveraging the ample unlabeled data allows capturing more faithfully the structure of data
## Methods:

In this work the authors take the labeling function f (x; θ) to be a multi-layer neural network.- This model can be decomposed into a feature extractor z(x; θz) ∈ Z parametrized by θz, and a classifier g(z(x; θz); θg) with parameters θg.
- The authors argue that classification is improved whenever data from each class form compact, well separated clusters in feature space Z.
- The authors introduce a regularizer (Section 3.2) that 1) encourages compact clustering according to propagated labels and 2) avoids disturbing existing clusters during optimization (Fig. 1)
## Results:

Performance of the method in comparison to recent SSL approaches that use similar experimental settings are reported in Table 1.- CCLP consistently improves performance over standard supervision even when all labels in the training set are used for supervision, indicating that CCLP could be used as a latent space regularizer in fully supervised systems.
- In the latter settings, CCLP offers greater improvement over the corresponding baselines than the most recent perturbation-based method, mean teacher (Tarvainen & Valpola, 2017).
- The compact clustering that the method encourages is orthogonal to previous approaches and could boost their performance further
## Conclusion:

The authors have presented a novel regularization technique for SSL, based on the idea of forming compact clusters in the latent space of a neural network while preserving existing clusters during optimization.- This is enabled by dynamically constructing a graph in latent space at each SGD iteration and propagating labels to estimate the manifold’s structure, which the authors regularize.
- Analyzing further the properties of a compactly clustered latent space, as well as applying the approach to larger benchmarks and the task of semantic segmentation is interesting future work

- Table1: Performance of CCLP compared to contemporary SSL methods on common benchmarks, when limited or all available labelled data is used as DL for training. Also shown is performance of the corresponding baseline with standard supervision (no SS). Error rate is shown as (mean ± st.dev.). Only results obtained without augmentation are shown. Methods in the lower part used larger classifiers

Related work

- The great potential and practical implications of utilizing unlabeled data has resulted in a large body of research on SSL. The techniques can be broadly categorized as follows.

2.1. Graph-Based Methods

These methods operate over an input graph with adjacency matrix A, where element Aij is the similarity between samples xi, xj ∈ DL ∪ DU . Similarity can be based on Euclidean distance (Zhu & Ghahramani, 2002) or other, sometimes task-specific metrics (Weston et al, 2012). Transductive inference for the graph’s unlabeled nodes is done based on the smoothness assumption, that nearby samples should have similar class posteriors. Label propagation (LP) (Zhu & Ghahramani, 2002) iteratively propagates the class posterior of each node to neighbors, faster through high density regions, until a global equilibrium is reached. Zhu et al (2003) showed that for binary classification one arrives at the same solution by minimizing the energy:

Funding

- This project has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 757173, project MIRA, ERC-2017-STG)
- KK is also supported by the President’s PhD Scholarship of Imperial College London
- DC is supported by CAPES, Ministry of Education, Brazil (BEX 1500/15-05)
- LLF is funded through EPSRC Healthcare Impact Partnerships grant (EP/P023509/1)
- IW is supported by the Natural Environment Research Council (NERC)

Reference

- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI 2016), volume 16, pp. 265–283, 2016.
- Atwood, J. and Towsley, D. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1993–2001, 2016.
- Bachman, P., Alsharif, O., and Precup, D. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems, pp. 3365–3373, 2014.
- Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(Nov):2399–2434, 2006.
- Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1):108–116, 1995.
- Blum, A. and Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100, 1998.
- Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., and Courville, A. Adversarially learned inference. In International Conference on Learning Representations, 2017.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
- Grandvalet, Y. and Bengio, Y. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, pp. 529–536, 2005.
- Haeusser, P., Mordvintsev, A., and Cremers, D. Learning by association – a versatile semi-supervised training method for neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Kingma, D. P., Mohamed, S., Rezende, D. J., and Welling, M. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pp. 3581–3589, 2014.
- Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2017.
- Kondor, R. I. and Lafferty, J. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the 19th International Conference on Machine Learning, volume 2, pp. 315–322, 2002.
- Laine, S. and Aila, T. Temporal ensembling for semisupervised learning. International Conference on Learning Representations, 2017.
- Lee, D.-H. Pseudo-label: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, pp. 2, 2013.
- Li, C., Zhu, J., and Zhang, B. Max-margin deep generative models for (semi-) supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Chapelle, O., Scholkopf, B., and Zien, A. Semi-supervised Learning. MIT Press, Cambridge, Mass., USA, 2006.
- Chongxuan, L., Xu, T., Zhu, J., and Zhang, B. Triple generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 4091–4101, 2017.
- Dai, Z., Yang, Z., Yang, F., Cohen, W. W., and Salakhutdinov, R. R. Good semi-supervised learning that requires a bad GAN. In Advances in Neural Information Processing Systems, pp. 6513–6523, 2017.
- Maaløe, L., Sønderby, C. K., Sønderby, S. K., and Winther, O. Auxiliary deep generative models. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, pp. 1445–1454, 2016.
- Maaten, L. v. d. and Hinton, G. Visualizing data using tSNE. Journal of Machine Learning Research, 9(Nov): 2579–2605, 2008.
- McLachlan, G. Discriminant Analysis and Statistical Pattern Recognition, volume 544. John Wiley & Sons, 2004.
- Miyato, T., Maeda, S.-i., Koyama, M., and Ishii, S. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017.
- Ranzato, M. and Szummer, M. Semi-supervised learning of compact document representations with deep networks. In Proceedings of the 25th International Conference on Machine Learning, pp. 792–799, 2008.
- Rasmus, A., Berglund, M., Honkala, M., Valpola, H., and Raiko, T. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp. 3546–3554, 2015.
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.
- Scudder, H. Probability of error of some adaptive patternrecognition machines. IEEE Transactions on Information Theory, 11(3):363–371, 1965.
- Springenberg, J. T. Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390, 2015.
- Tarvainen, A. and Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems, pp. 1195–1204, 2017.
- Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Scholkopf, B. Learning with local and global consistency. In Advances in Neural Information Processing Systems, pp. 321–328, 2004.
- Zhu, X. Semi-supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science, 2005.
- Zhu, X. and Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. Technical Report, Carnegie Mellon University, 2002.
- Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semisupervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning, pp. 912–919, 2003.

Tags

Comments