Small-GAN: Speeding Up GAN Training Using Core-sets

ICML 2020, 2019.

Cited by: 4|Bibtex|Views49|Links
Keywords:
gan trainingSelf Attention GANsfrechet inception distancebatch sizecoreset selectionMore(9+)
Weibo:
6 CONCLUSION In this work we present a general way to mimic using a large batch-size in Generative Adversarial Networks while minimizing computational overhead

Abstract:

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this ...More

Code:

Data:

Introduction
  • Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic.
  • Inception Score and Frechet Inception Distance: The authors will refer frequently to the Frechet Inception Distance (FID) (Heusel et al, 2017), to measure the effectiveness of an image synthesis model.
  • To compute this distance, one assumes that the authors have a pre-trained Inception classifier.
  • If the activations on the real data are N (m, C) and the activations on the fake data are N, the FID is defined as:
Highlights
  • Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic
  • We introduce a simple, computationally cheap method to increase the ‘effective batch size’ of Generative Adversarial Networks, which can be applied to any Generative Adversarial Networks variant
  • We use our method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at Generative Adversarial Networks-based anomaly detection
  • We test our method on standard image synthesis benchmarks and confirm that our technique seriously reduces the need for large mini-batches in Generative Adversarial Networks training
  • We investigate the performance of training a Generative Adversarial Networks to recover a different number of modes of 2D isotropic Gaussian distributions, with a standard deviation of 0.05
  • 6 CONCLUSION In this work we present a general way to mimic using a large batch-size in Generative Adversarial Networks while minimizing computational overhead
Methods
  • The authors look at the performance of the proposed sampling method on various tasks: In the first experiment, the authors train a GAN on a Gaussian mixture dataset with a large number of modes and confirm the method substantially mitigates ‘mode-dropping’.
  • The authors test the method on standard image synthesis benchmarks and confirm that the technique seriously reduces the need for large mini-batches in GAN training.
  • For over-sampling, the authors use a factor of 4 for the prior p(z) and a factor of 8 for the target, p(x), unless otherwise stated.
  • The authors investigate the effects of different over-sampling factors in the ablation study in Section 4.6
Results
  • The authors use the method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at GAN-based anomaly detection.
  • The results suggest that the models perform significantly better for any given batch size when Coreset-sampling is used
Conclusion
  • In this work the authors present a general way to mimic using a large batch-size in GANs while minimizing computational overhead.
  • This technique uses Core-set selection and improves performance in a wide variety of contexts.
  • This work suggets further research: a similar method could be applied to other learning tasks where large mini-batches may be useful
Summary
  • Introduction:

    Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic.
  • Inception Score and Frechet Inception Distance: The authors will refer frequently to the Frechet Inception Distance (FID) (Heusel et al, 2017), to measure the effectiveness of an image synthesis model.
  • To compute this distance, one assumes that the authors have a pre-trained Inception classifier.
  • If the activations on the real data are N (m, C) and the activations on the fake data are N, the FID is defined as:
  • Methods:

    The authors look at the performance of the proposed sampling method on various tasks: In the first experiment, the authors train a GAN on a Gaussian mixture dataset with a large number of modes and confirm the method substantially mitigates ‘mode-dropping’.
  • The authors test the method on standard image synthesis benchmarks and confirm that the technique seriously reduces the need for large mini-batches in GAN training.
  • For over-sampling, the authors use a factor of 4 for the prior p(z) and a factor of 8 for the target, p(x), unless otherwise stated.
  • The authors investigate the effects of different over-sampling factors in the ablation study in Section 4.6
  • Results:

    The authors use the method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at GAN-based anomaly detection.
  • The results suggest that the models perform significantly better for any given batch size when Coreset-sampling is used
  • Conclusion:

    In this work the authors present a general way to mimic using a large batch-size in GANs while minimizing computational overhead.
  • This technique uses Core-set selection and improves performance in a wide variety of contexts.
  • This work suggets further research: a similar method could be applied to other learning tasks where large mini-batches may be useful
Tables
  • Table1: Experiments with large number of modes
  • Table2: Experiments with Anomaly Detection on MNIST dataset. The Held-out digit represents the digit that was held out of the training set during training and treated as the anomaly class. The numbers reported is the area under the precision-recall curve
  • Table3: FID scores for CIFAR using SN-GAN as the batch-size is progressively doubled. The FID score is calculated using 50, 000 generated samples from the generator
  • Table4: FID scores for LSUN using SAGAN as the batch-size is progressively doubled. The FID score is calculated using 50, 000 generated samples from the generator. All experiments were run on the ‘outdoor church’ subset of the dataset
  • Table5: Timing to perform 50 gradient updates for SN-GAN with and without Core-sets. The time is measured in seconds. All the experiments were performed on a single NVIDIA Titan-XP GPU. The sampling factor was 4 for the prior and 8 for the target distribution
  • Table6: FID scores for CIFAR using SN-GAN. The experiment list is: A = Training an SN-GAN, B = Core-set selection directly on the images, C = Core-set applied directly on Inception embeddings without a random projection, D = Core-set applied only on the prior distribution, E = Core-set applied only on target distribution
  • Table7: FID scores for CIFAR using SN-GAN. Each of the experiment shows a different pair of over-sampling factors for the prior and target distributions. The factors are listed as: sampling factor for prior distribution × sampling factor for target distribution. A = 2 × 2; B = 2 × 4; C = 4 × 2; D = 4 × 4; E = 8 × 4; F = 4 × 8; G = 8 × 8; H = 16 × 16; I = 32 × 32
Download tables as Excel
Related work
Reference
  • Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1–30, 2005.
    Google ScholarLocate open access versionFindings
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
    Findings
  • Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 224–232. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Andrej Risteski, and Yi Zhang. Do gans learn the distribution? some theory and empirics. 2018.
    Google ScholarFindings
  • Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, and Augustus Odena. Discriminator rejection sampling. arXiv preprint arXiv:1810.06758, 2018.
    Findings
  • Mihai Badoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257. ACM, 2002.
    Google ScholarLocate open access versionFindings
  • Francisco Barahona and FabiaN A Chudak. Near-optimal solutions to large-scale facility location problems. Discrete Optimization, 2(1):35–50, 2005.
    Google ScholarLocate open access versionFindings
  • Marc G Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, and Remi Munos. The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.
    Findings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
  • Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.
    Google ScholarLocate open access versionFindings
  • Tatjana Chavdarova and Francois Fleuret. Sgan: An alternative training of generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407–9415, 2018.
    Google ScholarLocate open access versionFindings
  • Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. arXiv preprint arXiv:1904.08598, 2019.
    Findings
  • Kenneth L Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 2010.
    Google ScholarLocate open access versionFindings
  • Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.
    Google ScholarLocate open access versionFindings
  • David L Donoho et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS math challenges lecture, 1(2000):32, 2000.
    Google ScholarLocate open access versionFindings
  • Ishan Durugkar, Ian Gemp, and Sridhar Mahadevan. Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673, 2016.
    Findings
  • Ahmet M Eskicioglu and Paul S Fisher. Image quality measures and their performance. IEEE Transactions on communications, 43(12):2959–2965, 1995.
    Google ScholarLocate open access versionFindings
  • Reza Zanjirani Farahani and Masoud Hekmatfar. Facility location: concepts, models, algorithms and case studies. Springer, 2009.
    Google ScholarFindings
  • William Fedus, Ian Goodfellow, and Andrew M Dai. Maskgan: better text generation via filling in the. arXiv preprint arXiv:1801.07736, 2018.
    Findings
  • Dan Feldman, Matthew Faulkner, and Andreas Krause. Scalable training of mixture models via coresets. In Advances in neural information processing systems, pp. 2142–2150, 2011.
    Google ScholarLocate open access versionFindings
  • Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.
    Findings
  • Bernd Girod. What’s wrong with mean-squared error? Digital images and human vision, pp. 207–220, 1993.
    Google ScholarFindings
  • AJ Goldman. Optimal center location in simple networks. Transportation science, 5(2):212–221, 1971.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777, 2017.
    Google ScholarLocate open access versionFindings
  • Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3–19, 2007.
    Google ScholarLocate open access versionFindings
  • Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
    Google ScholarLocate open access versionFindings
  • Jonathan Huggins, Trevor Campbell, and Tamara Broderick. Coresets for scalable bayesian logistic regression. In Advances in Neural Information Processing Systems, pp. 4080–4088, 2016.
    Google ScholarLocate open access versionFindings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
    Findings
  • Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Rithesh Kumar, Anirudh Goyal, Aaron Courville, and Yoshua Bengio. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.
    Findings
  • Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun Kim, and Kuinam J Kim. A survey of deep learning-based network anomaly detection. Cluster Computing, pp. 1–13, 2017.
    Google ScholarLocate open access versionFindings
  • Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
    Google ScholarLocate open access versionFindings
  • Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. Mmd gan: Towards deeper understanding of moment matching network. In Advances in Neural Information Processing Systems, pp. 2203–2213, 2017a.
    Google ScholarLocate open access versionFindings
  • Jerry Li, Aleksander Madry, John Peebles, and Ludwig Schmidt. Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884, 2017b.
    Findings
  • Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802, 2017.
    Google ScholarLocate open access versionFindings
  • Lars Mescheder. On the convergence properties of gan training. arXiv preprint arXiv:1801.04406, 1:16, 2018.
    Findings
  • Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually converge? arXiv preprint arXiv:1801.04406, 2018.
    Findings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
    Findings
  • Youssef Mroueh and Tom Sercu. Fisher gan. In Advances in Neural Information Processing Systems, pp. 2513–2523, 2017.
    Google ScholarLocate open access versionFindings
  • Ben Mussay, Samson Zhou, Vladimir Braverman, and Dan Feldman. On activation function coresets for network pruning. arXiv preprint arXiv:1907.04018, 2019.
    Findings
  • Vaishnavh Nagarajan and J Zico Kolter. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pp. 5585–5595, 2017.
    Google ScholarLocate open access versionFindings
  • Cuong V Nguyen, Yingzhen Li, Thang D Bui, and Richard E Turner. Variational continual learning. arXiv preprint arXiv:1710.10628, 2017.
    Findings
  • Augustus Odena. Open questions about generative adversarial networks. Distill, 2019. doi: 10. 23915/distill.00018. https://distill.pub/2019/gan-open-problems.
    Findings
  • Catherine Olsson, Surya Bhupatiraju, Tom Brown, Augustus Odena, and Ian Goodfellow. Skill rating for generative models. arXiv preprint arXiv:1808.04888, 2018.
    Findings
  • Jeff M Phillips. Coresets and sketches. arXiv preprint arXiv:1601.00617, 2016.
    Findings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
    Findings
  • Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.
    Findings
  • Christopher J Shallue, Jaehoon Lee, Joe Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600, 2018.
    Findings
  • Samarth Sinha, Sayna Ebrahimi, and Trevor Darrell. Variational adversarial active learning. arXiv preprint arXiv:1904.00370, 2019.
    Findings
  • Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc V Le. Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
    Findings
  • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Ivor W Tsang, James T Kwok, and Pak-Ming Cheung. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research, 6(Apr):363–392, 2005.
    Google ScholarLocate open access versionFindings
  • Ivor W Tsang, Andras Kocsor, and James T Kwok. Simpler core vector machines with enclosing balls. In Proceedings of the 24th international conference on Machine learning, pp. 911–918. ACM, 2007.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Kai Wei, Yuzong Liu, Katrin Kirchhoff, and Jeff Bilmes. Using document summarization techniques for speech data subset selection. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 721–726, 2013.
    Google ScholarLocate open access versionFindings
  • Laurence A Wolsey and George L Nemhauser. Integer and combinatorial optimization. John Wiley & Sons, 2014.
    Google ScholarFindings
  • Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5542–5551, 2018.
    Google ScholarLocate open access versionFindings
  • Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pp. 3320–3328, 2014.
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
    Findings
  • Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, and Vijay Ramaseshan Chandrasekhar. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
    Findings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915, 2017.
    Google ScholarLocate open access versionFindings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.
    Findings
  • Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency regularization for generative adversarial networks, 2019.
    Google ScholarFindings
  • Jia-Jie Zhu and Jose Bento. Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.
    Findings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments