Adversarial Dropout for Supervised and Semi-supervised Learning

national conference on artificial intelligence, 2018.

Cited by: 69|Bibtex|Views217|Links
EI
Keywords:
fast gradient sign methodDeep neural networksSupervised Adversarial dropoutVirtual adversarial trainingadversarial exampleMore(9+)
Weibo:
The experiments showed that the generalization performances are improved by applying our adversarial dropout

Abstract:

Recently, the training with adversarial examples, which are generated by adding a small but worst-case perturbation on input examples, has been proved to improve generalization performance of neural networks. In contrast to the individually biased inputs to enhance the generality, this paper introduces adversarial dropout, which is a mini...More

Code:

Data:

0
Introduction
  • Deep neural networks (DNNs) have demonstrated the significant improvement on benchmark performances in a wide range of applications.
  • The earlier work by Hinton et al (2012) and Srivastava et al (2014) interpreted dropout as an extreme form of model combinations, a.k.a. a model ensemble, by sharing extensive parameters on neural networks.
  • They proposed learning the model combination through minimizing an expected loss of models perturbed by dropout.
  • Extending the weight sharing perspective, several studies (Baldi and Sadowski 2013; Chen et al 2014; Jain et al 2015) analyzed the ensemble effects from the dropout
Highlights
  • Deep neural networks (DNNs) have demonstrated the significant improvement on benchmark performances in a wide range of applications
  • Afterwards, we review adversarial training and temporal ensembling, or Π model, because two methods are closely related to adversarial dropout
  • We obtain adv that maximizes the divergence with the constraint, and we evaluate the loss function LAdD
  • We found that the adversarial dropout has fewer highly activated units compared to others
  • In order to investigate the different properties of adversarial dropout, we explore a very simple case of applying adversarial training and adversarial dropout to the linear regression
  • The experiments showed that the generalization performances are improved by applying our adversarial dropout
Methods
  • The authors present the adversarial dropout that combines the ideas of adversarial training and dropout.
  • When the authors trained the autoencoder, the authors set the dropout with p = 0.5, and the authors calculated the reconstruction error between the input data and the output layer as a loss function to update the weight values of the autoencoder with the standard dropout.
  • The adversarial dropout error is considered when the authors update the weight values of the autoencoder with the parameters, λ = 0.2, and δ = 0.3.
  • The trained autoencoders showed similar reconstruction errors on the test dataset
Results
  • When applying VAT and VAdD together by adding their divergence terms on the loss function, see Formula 7, the authors achieved the state-of-the-art performances on the semi-supervised learning on both datasets; 3.55% of test error rates on SVHN, and 10.04% and 9.22% of test error rates on CIFAR-10
Conclusion
  • The key point of the paper is combining the ideas from the adversarial training and the dropout.
  • The existing methods of the adversarial training control a linear perturbation with additive properties only on the input layer.
  • The authors combined the concept of the perturbation with the dropout properties on hidden layers.
  • The experiments showed that the generalization performances are improved by applying the adversarial dropout.
  • The authors' approach achieved the-state-of-the-art performances of 3.55% on SVHN and 9.22% on CIFAR-10 by applying VAdD and VAT together for the semi-supervised learning
Summary
  • Introduction:

    Deep neural networks (DNNs) have demonstrated the significant improvement on benchmark performances in a wide range of applications.
  • The earlier work by Hinton et al (2012) and Srivastava et al (2014) interpreted dropout as an extreme form of model combinations, a.k.a. a model ensemble, by sharing extensive parameters on neural networks.
  • They proposed learning the model combination through minimizing an expected loss of models perturbed by dropout.
  • Extending the weight sharing perspective, several studies (Baldi and Sadowski 2013; Chen et al 2014; Jain et al 2015) analyzed the ensemble effects from the dropout
  • Methods:

    The authors present the adversarial dropout that combines the ideas of adversarial training and dropout.
  • When the authors trained the autoencoder, the authors set the dropout with p = 0.5, and the authors calculated the reconstruction error between the input data and the output layer as a loss function to update the weight values of the autoencoder with the standard dropout.
  • The adversarial dropout error is considered when the authors update the weight values of the autoencoder with the parameters, λ = 0.2, and δ = 0.3.
  • The trained autoencoders showed similar reconstruction errors on the test dataset
  • Results:

    When applying VAT and VAdD together by adding their divergence terms on the loss function, see Formula 7, the authors achieved the state-of-the-art performances on the semi-supervised learning on both datasets; 3.55% of test error rates on SVHN, and 10.04% and 9.22% of test error rates on CIFAR-10
  • Conclusion:

    The key point of the paper is combining the ideas from the adversarial training and the dropout.
  • The existing methods of the adversarial training control a linear perturbation with additive properties only on the input layer.
  • The authors combined the concept of the perturbation with the dropout properties on hidden layers.
  • The experiments showed that the generalization performances are improved by applying the adversarial dropout.
  • The authors' approach achieved the-state-of-the-art performances of 3.55% on SVHN and 9.22% on CIFAR-10 by applying VAdD and VAT together for the semi-supervised learning
Tables
  • Table1: Test performance with 1,000 labeled (semisupervised) and 60,000 labeled (supervised) examples on MNIST. Each setting is repeated for eight times
  • Table2: Test performances of semi-supervised and supervised learning on SVHN and CIFAR-10. Each setting is repeated for five times. KL and QE indicate Kullback-Leibler divergence and quadratic error, respectively, to specify the divergence function, D[y, y]
Download tables as Excel
Funding
  • This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2017R1D1A1A01058209)
Reference
  • Bachman, P.; Alsharif, O.; and Precup, D. 2014. Learning with pseudo-ensembles. In Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N. D.; and Weinberger, K. Q., eds., Advances in Neural Information Processing Systems 27. Curran Associates, Inc. 3365–3373.
    Google ScholarLocate open access versionFindings
  • Baldi, P., and Sadowski, P. J. 2013. Understanding dropout. In Advances in Neural Information Processing Systems. 2814–2822.
    Google ScholarLocate open access versionFindings
  • Bishop, C. M. 1995a. Training with noise is equivalent to tikhonov regularization. Neural computation 7(1):108–116.
    Google ScholarLocate open access versionFindings
  • Bishop, C. M. 1995b. Regularization and complexity control in feed-forward networks.
    Google ScholarFindings
  • Chen, N.; Zhu, J.; Chen, J.; and Zhang, B. 2014. Dropout training for support vector machines. arXiv preprint arXiv:1404.4171.
    Findings
  • Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
    Findings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision, 630–645. Springer.
    Google ScholarLocate open access versionFindings
  • Hemmecke, R.; Koppe, M.; Lee, J.; and Weismantel, R. 2010. Nonlinear integer programming. In 50 Years of Integer Programming 1958-200Springer. 561–618.
    Google ScholarFindings
  • Hinton, G. E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. R. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
    Findings
  • Huang, G.; Liu, Z.; Weinberger, K. Q.; and van der Maaten, L. 2016. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993.
    Findings
  • Jain, P.; Kulkarni, V.; Thakurta, A.; and Williams, O. 2015. To drop or not to drop: Robustness, consistency and differential privacy properties of dropout. arXiv preprint arXiv:1503.02031.
    Findings
  • Kellerer, H.; Pferschy, U.; and Pisinger, D. 2004. Introduction to np-completeness of knapsack problems. In Knapsack problems. Springer. 483–493.
    Google ScholarFindings
  • Krizhevsky, A., and Hinton, G. 2009. Learning multiple layers of features from tiny images.
    Google ScholarFindings
  • Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533.
    Findings
  • Laine, S., and Aila, T. 2016. Temporal ensembling for semisupervised learning. arXiv preprint arXiv:1610.02242.
    Findings
  • Lasserre, J. A.; Bishop, C. M.; and Minka, T. P. 2006. Principled hybrids of generative and discriminative models. In
    Google ScholarLocate open access versionFindings
  • Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 1, 87–94. IEEE.
    Google ScholarLocate open access versionFindings
  • LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
    Google ScholarLocate open access versionFindings
  • Li, Z.; Gong, B.; and Yang, T. 2016. Improved dropout for shallow and deep learning. In Advances in Neural Information Processing Systems, 2523–2531.
    Google ScholarLocate open access versionFindings
  • Maaten, L.; Chen, M.; Tyree, S.; and Weinberger, K. Q. 2013. Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 410–418.
    Google ScholarLocate open access versionFindings
  • Miyato, T.; Maeda, S.-i.; Koyama, M.; Nakae, K.; and Ishii, S. 2015. Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677.
    Findings
  • Miyato, T.; Maeda, S.-i.; Koyama, M.; and Ishii, S. 2017. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976.
    Findings
  • Miyato, T.; Dai, A. M.; and Goodfellow, I. 2016. Virtual adversarial training for semi-supervised text classification. stat 1050:25.
    Google ScholarFindings
  • Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; and Ng, A. Y. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, 5.
    Google ScholarLocate open access versionFindings
  • Poole, B.; Sohl-Dickstein, J.; and Ganguli, S. 2014. Analyzing noise in autoencoders and deep networks. arXiv preprint arXiv:1406.1831.
    Findings
  • Rasmus, A.; Berglund, M.; Honkala, M.; Valpola, H.; and Raiko, T. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, 3546–3554.
    Google ScholarLocate open access versionFindings
  • Sajjadi, M.; Javanmardi, M.; and Tasdizen, T. 2016. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, 1163–1171.
    Google ScholarLocate open access versionFindings
  • Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1):1929–1958.
    Google ScholarLocate open access versionFindings
  • Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
    Findings
  • Wager, S.; Wang, S.; and Liang, P. S. 2013. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems, 351–359.
    Google ScholarLocate open access versionFindings
  • Wang, S., and Manning, C. 2013. Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 118–126.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments