Semi-supervised Deep Learning with Memory
ECCV, pp. 275-291, 2018.
EI
Keywords:
Semi-Supervised Learningabundant unlabelledexternal memoryConvolutional Neural Networkbenchmark datasetMore(19+)
Weibo:
Abstract:
We consider the semi-supervised multi-class classification problem of learning from sparse labelled and abundant unlabelled training data. To address this problem, existing semi-supervised deep learning methods often rely on the up-to-date “network-in-training” to formulate the semi-supervised learning objective. This ignores both the dis...More
Code:
Data:
Introduction
- Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37].
- It is motivated that unlabelled data are available at large scale but labelled data are scarce due to high labelling costs.
- In the SSL literature, the most straightforward SSL algorithm is self-training where the target model is incrementally trained by additional self-labelled data given by the model’s own predictions with high confidence [21, 2, 25].
- Other common methods include Transductive SVM [10, 3] and graphbased methods [39, 1], which, are likely to suffer from poor scalability to large-scale unlabelled data due to inefficient optimisation
Highlights
- Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37]
- The memory size of the memory module in Memory-Assisted Deep Neural Network is only proportional to the number of classes; while Temporal Ensembling requires to store the predictions of all samples in a large mapped file with a memory size proportional to the number of training samples. Unlike generative models including DGM, CatGAN, ADGM, SDGM, ImpGAN, and ALI, our Memory-Assisted Deep Neural Network does not need to generate additional synthetic images during training, resulting in more efficient model training
- We evaluate the individual contribution of two loss terms in the memory loss formulation (Eq (7)): (1) the Model Entropy (ME) (Eq (8)), and (2) the Memory-Network Divergence (MND) (Eq (9))
- We present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data
- The Memory-Assisted Deep Neural Network is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data
- Extensive comparative evaluations on three semi-supervised image classification benchmark datasets validate the advantages of the proposed Memory-Assisted Deep Neural Network over a wide range of state-of-the-art methods
Methods
- In Table 1, the authors compare the model to 11 state-of-the-art competitive methods with their reported results on SVHN, CIFAR10 and CIFAR100
- Among all these methods, Mean Teacher is the only one that slightly outperforms the MA-DNN on the digit classification task.
- Eliminating the MND term causes performance drop of 2.54%(6.75-4.21), 5.50%(17.41-11.91), 7.39%(41.90-34.51) on SVHN, CIFAR10, and CIFAR100 respectively
- This indicates the effectiveness of encouraging the network predictions to be consistent with reliable memory predictions derived from the memory of model learning.
- The authors adopt the supervised counterpart CNN-Supervised trained only using the same labelled data without the
Results
- (2) CIFAR10 [13]: A natural images dataset containing 50,000/10,000 training/test image samples from 10 object classes.
- Following the standard semi-supervised classification protocol [12, 24, 30, 19], the authors randomly divide the training data into a small labelled set and a large unlabelled set.
- The number of labelled training images is 1,000/4,000/10,000 on SVHN/CIFAR10/CIFAR100 respectively, with the remaining 72,257/46,000/40,000 images as unlabelled training data.
Conclusion
- The authors present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data.
- The MA-DNN is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data.
- The authors formulate a novel assimilationaccommodation interaction between the network and an external memory module capable of facilitating more effective semi-supervised deep learning by imposing a memory loss derived from the incrementally updated memory module.
- The authors provide detailed ablation studies and further analysis to give insights on the model design and performance gains
Summary
Introduction:
Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37].- It is motivated that unlabelled data are available at large scale but labelled data are scarce due to high labelling costs.
- In the SSL literature, the most straightforward SSL algorithm is self-training where the target model is incrementally trained by additional self-labelled data given by the model’s own predictions with high confidence [21, 2, 25].
- Other common methods include Transductive SVM [10, 3] and graphbased methods [39, 1], which, are likely to suffer from poor scalability to large-scale unlabelled data due to inefficient optimisation
Methods:
In Table 1, the authors compare the model to 11 state-of-the-art competitive methods with their reported results on SVHN, CIFAR10 and CIFAR100- Among all these methods, Mean Teacher is the only one that slightly outperforms the MA-DNN on the digit classification task.
- Eliminating the MND term causes performance drop of 2.54%(6.75-4.21), 5.50%(17.41-11.91), 7.39%(41.90-34.51) on SVHN, CIFAR10, and CIFAR100 respectively
- This indicates the effectiveness of encouraging the network predictions to be consistent with reliable memory predictions derived from the memory of model learning.
- The authors adopt the supervised counterpart CNN-Supervised trained only using the same labelled data without the
Results:
(2) CIFAR10 [13]: A natural images dataset containing 50,000/10,000 training/test image samples from 10 object classes.- Following the standard semi-supervised classification protocol [12, 24, 30, 19], the authors randomly divide the training data into a small labelled set and a large unlabelled set.
- The number of labelled training images is 1,000/4,000/10,000 on SVHN/CIFAR10/CIFAR100 respectively, with the remaining 72,257/46,000/40,000 images as unlabelled training data.
Conclusion:
The authors present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data.- The MA-DNN is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data.
- The authors formulate a novel assimilationaccommodation interaction between the network and an external memory module capable of facilitating more effective semi-supervised deep learning by imposing a memory loss derived from the incrementally updated memory module.
- The authors provide detailed ablation studies and further analysis to give insights on the model design and performance gains
Tables
- Table1: Evaluation on semi-supervised image classification benchmarks in comparison to state-of-the-art methods. Metric: Error rate (%) ± standard deviation, lower is better. “–” indicates no reported result. “∗” indicates generative models
- Table2: Evaluation on the effect of individual memory loss terms. Metric: Error rate (%) ± standard deviation, lower is better. ME: Model Entropy; MND: MemoryNetwork Divergence
Related work
- Semi-supervised deep learning has recently gained increasing attraction due to the strong generalisation power of deep neural networks [35, 15, 12, 30, 24, 19, 14]. A common strategy is to train the deep neural networks by simultaneously optimising a standard supervised classification loss on labelled samples along with an additional unsupervised loss term imposed on either unlabelled data [15, 27, 5] or both labelled and unlabelled data [35, 24, 19, 14]. These additional loss terms are considered as unsupervised supervision signals, since ground-truth label is not necessarily required to derive the loss values. For example, Lee [15] utilises the cross-entropy loss computed on the pseudo labels (the classes with the maximum predicted probability given by the up-to-date network) of unlabelled samples as an additional supervision signal. Rasmus et al [24] adopt the reconstruction loss between one clean forward propagation and one stochasticallycorrupted forward propagation derived for the same sample. Miyato et al [19] define the distributional smoothness against local random perturbation as an unsupervised penalty. Laine et al [14] introduce an unsupervised L2 loss to penalise the inconsistency between the network predictions and the temporally ensembled network predictions. Overall, the rationale of these SSL algorithms is to regularise the network by enforcing smooth and consistent classification boundaries that are robust to random perturbation [24, 19]; or to enrich the supervision signals by exploiting the knowledge learned by the network, such as using the pseudo labels [15] or the temporally ensembled predictions [14].
Funding
- This work was partly supported by the China Scholarship Council, Vision Semantics Limited, the Royal Society Newton Advanced Fellowship Programme (NA150459) and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111571149)
Reference
- Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: International Conference on Machine Learning (2004)
- Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory. ACM (1998)
- Chapelle, O., Zien, A., Ghahramani, C.Z., et al.: Semi-supervised classification by low density separation. In: Tenth International Workshop on Artificial Intelligence and Statistics (2005)
- Chapelle, O., Schlkopf, B., Zien, A.: Semi-supervised learning. The MIT Press (2010)
- Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., Courville, A.: Adversarially learned inference. In: International Conference on Learning Representation (2017)
- Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: Advances in Neural Information Processing Systems (2009)
- Ginsburg, H.P., Opper, S.: Piaget’s theory of intellectual development. PrenticeHall, Inc (1988)
- Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (2005)
- Haeusser, P., Mordvintsev, A., Cremers, D.: Learning by association-a versatile semi-supervised training method for neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
- Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning (1999)
- Kaiser, L., Nachum, O., Roy, A., Bengio, S.: Learning to remember rare events. In: International Conference on Learning Representation (2017)
- Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems (2014)
- Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
- Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representation (2017)
- Lee, D.H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning (2013)
- Maaløe, L., Sønderby, C.K., Sønderby, S.K., Winther, O.: Auxiliary deep generative models. In: International Conference on Machine Learning (2016)
- Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. The Journal of Machine Learning Research (2008)
- Miller, A., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
- Miyato, T., Maeda, S.i., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing with virtual adversarial training. In: International Conference on Learning Representation (2016)
- Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning (2011)
- Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on Information and knowledge management. ACM (2000)
- Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. In: International Conference on Learning Representation (2017)
- Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: International Conference on Machine Learning (2008)
- Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems (2015)
- Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision. Citeseer (2005)
- Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems (2016)
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems (2016)
- Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning (2016)
- Shi, M., Zhang, B.: Semi-supervised learning improves gene expression-based prediction of cancer recurrence. Bioinformatics 27(21) (2011)
- Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: International Conference on Learning Representation (2016)
- Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems (2015)
- Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (2017)
- Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision (2016)
- Weston, J., Chopra, S., Bordes, A.: Memory networks. In: International Conference on Learning Representation (2014)
- Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: International Conference on Machine Learning (2008)
- Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems (2004)
- Zhu, X.: Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2(3), 4 (2006)
- Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)
- Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: International Conference on Machine Learning (2003)
Tags
Comments