Semi-supervised Deep Learning with Memory

ECCV, pp. 275-291, 2018.

Cited by: 19|Bibtex|Views169|Links
EI
Keywords:
Semi-Supervised Learningabundant unlabelledexternal memoryConvolutional Neural Networkbenchmark datasetMore(19+)
Weibo:
We present a novel Memory-Assisted Deep Neural Network to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data

Abstract:

We consider the semi-supervised multi-class classification problem of learning from sparse labelled and abundant unlabelled training data. To address this problem, existing semi-supervised deep learning methods often rely on the up-to-date “network-in-training” to formulate the semi-supervised learning objective. This ignores both the dis...More

Code:

Data:

0
Introduction
  • Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37].
  • It is motivated that unlabelled data are available at large scale but labelled data are scarce due to high labelling costs.
  • In the SSL literature, the most straightforward SSL algorithm is self-training where the target model is incrementally trained by additional self-labelled data given by the model’s own predictions with high confidence [21, 2, 25].
  • Other common methods include Transductive SVM [10, 3] and graphbased methods [39, 1], which, are likely to suffer from poor scalability to large-scale unlabelled data due to inefficient optimisation
Highlights
  • Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37]
  • The memory size of the memory module in Memory-Assisted Deep Neural Network is only proportional to the number of classes; while Temporal Ensembling requires to store the predictions of all samples in a large mapped file with a memory size proportional to the number of training samples. Unlike generative models including DGM, CatGAN, ADGM, SDGM, ImpGAN, and ALI, our Memory-Assisted Deep Neural Network does not need to generate additional synthetic images during training, resulting in more efficient model training
  • We evaluate the individual contribution of two loss terms in the memory loss formulation (Eq (7)): (1) the Model Entropy (ME) (Eq (8)), and (2) the Memory-Network Divergence (MND) (Eq (9))
  • We present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data
  • The Memory-Assisted Deep Neural Network is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data
  • Extensive comparative evaluations on three semi-supervised image classification benchmark datasets validate the advantages of the proposed Memory-Assisted Deep Neural Network over a wide range of state-of-the-art methods
Methods
  • In Table 1, the authors compare the model to 11 state-of-the-art competitive methods with their reported results on SVHN, CIFAR10 and CIFAR100
  • Among all these methods, Mean Teacher is the only one that slightly outperforms the MA-DNN on the digit classification task.
  • Eliminating the MND term causes performance drop of 2.54%(6.75-4.21), 5.50%(17.41-11.91), 7.39%(41.90-34.51) on SVHN, CIFAR10, and CIFAR100 respectively
  • This indicates the effectiveness of encouraging the network predictions to be consistent with reliable memory predictions derived from the memory of model learning.
  • The authors adopt the supervised counterpart CNN-Supervised trained only using the same labelled data without the
Results
  • (2) CIFAR10 [13]: A natural images dataset containing 50,000/10,000 training/test image samples from 10 object classes.
  • Following the standard semi-supervised classification protocol [12, 24, 30, 19], the authors randomly divide the training data into a small labelled set and a large unlabelled set.
  • The number of labelled training images is 1,000/4,000/10,000 on SVHN/CIFAR10/CIFAR100 respectively, with the remaining 72,257/46,000/40,000 images as unlabelled training data.
Conclusion
  • The authors present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data.
  • The MA-DNN is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data.
  • The authors formulate a novel assimilationaccommodation interaction between the network and an external memory module capable of facilitating more effective semi-supervised deep learning by imposing a memory loss derived from the incrementally updated memory module.
  • The authors provide detailed ablation studies and further analysis to give insights on the model design and performance gains
Summary
  • Introduction:

    Semi-supervised learning (SSL) aims to boost the model performance by utilising the large amount of unlabelled data when only a limited amount of labelled data is available [4, 37].
  • It is motivated that unlabelled data are available at large scale but labelled data are scarce due to high labelling costs.
  • In the SSL literature, the most straightforward SSL algorithm is self-training where the target model is incrementally trained by additional self-labelled data given by the model’s own predictions with high confidence [21, 2, 25].
  • Other common methods include Transductive SVM [10, 3] and graphbased methods [39, 1], which, are likely to suffer from poor scalability to large-scale unlabelled data due to inefficient optimisation
  • Methods:

    In Table 1, the authors compare the model to 11 state-of-the-art competitive methods with their reported results on SVHN, CIFAR10 and CIFAR100
  • Among all these methods, Mean Teacher is the only one that slightly outperforms the MA-DNN on the digit classification task.
  • Eliminating the MND term causes performance drop of 2.54%(6.75-4.21), 5.50%(17.41-11.91), 7.39%(41.90-34.51) on SVHN, CIFAR10, and CIFAR100 respectively
  • This indicates the effectiveness of encouraging the network predictions to be consistent with reliable memory predictions derived from the memory of model learning.
  • The authors adopt the supervised counterpart CNN-Supervised trained only using the same labelled data without the
  • Results:

    (2) CIFAR10 [13]: A natural images dataset containing 50,000/10,000 training/test image samples from 10 object classes.
  • Following the standard semi-supervised classification protocol [12, 24, 30, 19], the authors randomly divide the training data into a small labelled set and a large unlabelled set.
  • The number of labelled training images is 1,000/4,000/10,000 on SVHN/CIFAR10/CIFAR100 respectively, with the remaining 72,257/46,000/40,000 images as unlabelled training data.
  • Conclusion:

    The authors present a novel Memory-Assisted Deep Neural Network (MADNN) to enable semi-supervised deep learning on sparsely labelled and abundant unlabelled training data.
  • The MA-DNN is established on the idea of exploiting the memory of model learning to more reliably and effectively learn from the unlabelled training data.
  • The authors formulate a novel assimilationaccommodation interaction between the network and an external memory module capable of facilitating more effective semi-supervised deep learning by imposing a memory loss derived from the incrementally updated memory module.
  • The authors provide detailed ablation studies and further analysis to give insights on the model design and performance gains
Tables
  • Table1: Evaluation on semi-supervised image classification benchmarks in comparison to state-of-the-art methods. Metric: Error rate (%) ± standard deviation, lower is better. “–” indicates no reported result. “∗” indicates generative models
  • Table2: Evaluation on the effect of individual memory loss terms. Metric: Error rate (%) ± standard deviation, lower is better. ME: Model Entropy; MND: MemoryNetwork Divergence
Download tables as Excel
Related work
  • Semi-supervised deep learning has recently gained increasing attraction due to the strong generalisation power of deep neural networks [35, 15, 12, 30, 24, 19, 14]. A common strategy is to train the deep neural networks by simultaneously optimising a standard supervised classification loss on labelled samples along with an additional unsupervised loss term imposed on either unlabelled data [15, 27, 5] or both labelled and unlabelled data [35, 24, 19, 14]. These additional loss terms are considered as unsupervised supervision signals, since ground-truth label is not necessarily required to derive the loss values. For example, Lee [15] utilises the cross-entropy loss computed on the pseudo labels (the classes with the maximum predicted probability given by the up-to-date network) of unlabelled samples as an additional supervision signal. Rasmus et al [24] adopt the reconstruction loss between one clean forward propagation and one stochasticallycorrupted forward propagation derived for the same sample. Miyato et al [19] define the distributional smoothness against local random perturbation as an unsupervised penalty. Laine et al [14] introduce an unsupervised L2 loss to penalise the inconsistency between the network predictions and the temporally ensembled network predictions. Overall, the rationale of these SSL algorithms is to regularise the network by enforcing smooth and consistent classification boundaries that are robust to random perturbation [24, 19]; or to enrich the supervision signals by exploiting the knowledge learned by the network, such as using the pseudo labels [15] or the temporally ensembled predictions [14].
Funding
  • This work was partly supported by the China Scholarship Council, Vision Semantics Limited, the Royal Society Newton Advanced Fellowship Programme (NA150459) and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111571149)
Reference
  • Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: International Conference on Machine Learning (2004)
    Google ScholarLocate open access versionFindings
  • Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory. ACM (1998)
    Google ScholarLocate open access versionFindings
  • Chapelle, O., Zien, A., Ghahramani, C.Z., et al.: Semi-supervised classification by low density separation. In: Tenth International Workshop on Artificial Intelligence and Statistics (2005)
    Google ScholarLocate open access versionFindings
  • Chapelle, O., Schlkopf, B., Zien, A.: Semi-supervised learning. The MIT Press (2010)
    Google ScholarFindings
  • Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., Courville, A.: Adversarially learned inference. In: International Conference on Learning Representation (2017)
    Google ScholarLocate open access versionFindings
  • Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: Advances in Neural Information Processing Systems (2009)
    Google ScholarLocate open access versionFindings
  • Ginsburg, H.P., Opper, S.: Piaget’s theory of intellectual development. PrenticeHall, Inc (1988)
    Google ScholarFindings
  • Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (2005)
    Google ScholarLocate open access versionFindings
  • Haeusser, P., Mordvintsev, A., Cremers, D.: Learning by association-a versatile semi-supervised training method for neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
    Google ScholarLocate open access versionFindings
  • Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning (1999)
    Google ScholarLocate open access versionFindings
  • Kaiser, L., Nachum, O., Roy, A., Bengio, S.: Learning to remember rare events. In: International Conference on Learning Representation (2017)
    Google ScholarLocate open access versionFindings
  • Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems (2014)
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
    Google ScholarFindings
  • Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representation (2017)
    Google ScholarLocate open access versionFindings
  • Lee, D.H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning (2013)
    Google ScholarLocate open access versionFindings
  • Maaløe, L., Sønderby, C.K., Sønderby, S.K., Winther, O.: Auxiliary deep generative models. In: International Conference on Machine Learning (2016)
    Google ScholarLocate open access versionFindings
  • Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. The Journal of Machine Learning Research (2008)
    Google ScholarLocate open access versionFindings
  • Miller, A., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
    Google ScholarLocate open access versionFindings
  • Miyato, T., Maeda, S.i., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing with virtual adversarial training. In: International Conference on Learning Representation (2016)
    Google ScholarLocate open access versionFindings
  • Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning (2011)
    Google ScholarFindings
  • Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on Information and knowledge management. ACM (2000)
    Google ScholarLocate open access versionFindings
  • Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. In: International Conference on Learning Representation (2017)
    Google ScholarLocate open access versionFindings
  • Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: International Conference on Machine Learning (2008)
    Google ScholarLocate open access versionFindings
  • Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems (2015)
    Google ScholarLocate open access versionFindings
  • Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision. Citeseer (2005)
    Google ScholarLocate open access versionFindings
  • Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems (2016)
    Google ScholarLocate open access versionFindings
  • Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems (2016)
    Google ScholarLocate open access versionFindings
  • Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning (2016)
    Google ScholarLocate open access versionFindings
  • Shi, M., Zhang, B.: Semi-supervised learning improves gene expression-based prediction of cancer recurrence. Bioinformatics 27(21) (2011)
    Google ScholarLocate open access versionFindings
  • Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: International Conference on Learning Representation (2016)
    Google ScholarLocate open access versionFindings
  • Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems (2015)
    Google ScholarLocate open access versionFindings
  • Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (2017)
    Google ScholarLocate open access versionFindings
  • Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision (2016)
    Google ScholarLocate open access versionFindings
  • Weston, J., Chopra, S., Bordes, A.: Memory networks. In: International Conference on Learning Representation (2014)
    Google ScholarLocate open access versionFindings
  • Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: International Conference on Machine Learning (2008)
    Google ScholarLocate open access versionFindings
  • Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems (2004)
    Google ScholarLocate open access versionFindings
  • Zhu, X.: Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2(3), 4 (2006)
    Google ScholarFindings
  • Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)
    Google ScholarFindings
  • Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: International Conference on Machine Learning (2003)
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments