# Small-GAN: Speeding Up GAN Training Using Core-sets

ICML 2020, 2019.

Keywords:

Weibo:

Abstract:

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this ...More

Code:

Data:

Introduction

- Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic.
- Inception Score and Frechet Inception Distance: The authors will refer frequently to the Frechet Inception Distance (FID) (Heusel et al, 2017), to measure the effectiveness of an image synthesis model.
- To compute this distance, one assumes that the authors have a pre-trained Inception classifier.
- If the activations on the real data are N (m, C) and the activations on the fake data are N, the FID is defined as:

Highlights

- Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic
- We introduce a simple, computationally cheap method to increase the ‘effective batch size’ of Generative Adversarial Networks, which can be applied to any Generative Adversarial Networks variant
- We use our method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at Generative Adversarial Networks-based anomaly detection
- We test our method on standard image synthesis benchmarks and confirm that our technique seriously reduces the need for large mini-batches in Generative Adversarial Networks training
- We investigate the performance of training a Generative Adversarial Networks to recover a different number of modes of 2D isotropic Gaussian distributions, with a standard deviation of 0.05
- 6 CONCLUSION In this work we present a general way to mimic using a large batch-size in Generative Adversarial Networks while minimizing computational overhead

Methods

- The authors look at the performance of the proposed sampling method on various tasks: In the first experiment, the authors train a GAN on a Gaussian mixture dataset with a large number of modes and confirm the method substantially mitigates ‘mode-dropping’.
- The authors test the method on standard image synthesis benchmarks and confirm that the technique seriously reduces the need for large mini-batches in GAN training.
- For over-sampling, the authors use a factor of 4 for the prior p(z) and a factor of 8 for the target, p(x), unless otherwise stated.
- The authors investigate the effects of different over-sampling factors in the ablation study in Section 4.6

Results

- The authors use the method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at GAN-based anomaly detection.
- The results suggest that the models perform significantly better for any given batch size when Coreset-sampling is used

Conclusion

- In this work the authors present a general way to mimic using a large batch-size in GANs while minimizing computational overhead.
- This technique uses Core-set selection and improves performance in a wide variety of contexts.
- This work suggets further research: a similar method could be applied to other learning tasks where large mini-batches may be useful

Summary

## Introduction:

Generative Adversarial Networks (GANs) (Goodfellow et al, 2014) have become a popular research topic.- Inception Score and Frechet Inception Distance: The authors will refer frequently to the Frechet Inception Distance (FID) (Heusel et al, 2017), to measure the effectiveness of an image synthesis model.
- To compute this distance, one assumes that the authors have a pre-trained Inception classifier.
- If the activations on the real data are N (m, C) and the activations on the fake data are N, the FID is defined as:
## Methods:

The authors look at the performance of the proposed sampling method on various tasks: In the first experiment, the authors train a GAN on a Gaussian mixture dataset with a large number of modes and confirm the method substantially mitigates ‘mode-dropping’.- The authors test the method on standard image synthesis benchmarks and confirm that the technique seriously reduces the need for large mini-batches in GAN training.
- For over-sampling, the authors use a factor of 4 for the prior p(z) and a factor of 8 for the target, p(x), unless otherwise stated.
- The authors investigate the effects of different over-sampling factors in the ablation study in Section 4.6
## Results:

The authors use the method to improve the performance of the technique from Kumar et al (2019), resulting in state-of-the-art performance at GAN-based anomaly detection.- The results suggest that the models perform significantly better for any given batch size when Coreset-sampling is used
## Conclusion:

In this work the authors present a general way to mimic using a large batch-size in GANs while minimizing computational overhead.- This technique uses Core-set selection and improves performance in a wide variety of contexts.
- This work suggets further research: a similar method could be applied to other learning tasks where large mini-batches may be useful

- Table1: Experiments with large number of modes
- Table2: Experiments with Anomaly Detection on MNIST dataset. The Held-out digit represents the digit that was held out of the training set during training and treated as the anomaly class. The numbers reported is the area under the precision-recall curve
- Table3: FID scores for CIFAR using SN-GAN as the batch-size is progressively doubled. The FID score is calculated using 50, 000 generated samples from the generator
- Table4: FID scores for LSUN using SAGAN as the batch-size is progressively doubled. The FID score is calculated using 50, 000 generated samples from the generator. All experiments were run on the ‘outdoor church’ subset of the dataset
- Table5: Timing to perform 50 gradient updates for SN-GAN with and without Core-sets. The time is measured in seconds. All the experiments were performed on a single NVIDIA Titan-XP GPU. The sampling factor was 4 for the prior and 8 for the target distribution
- Table6: FID scores for CIFAR using SN-GAN. The experiment list is: A = Training an SN-GAN, B = Core-set selection directly on the images, C = Core-set applied directly on Inception embeddings without a random projection, D = Core-set applied only on the prior distribution, E = Core-set applied only on target distribution
- Table7: FID scores for CIFAR using SN-GAN. Each of the experiment shows a different pair of over-sampling factors for the prior and target distributions. The factors are listed as: sampling factor for prior distribution × sampling factor for target distribution. A = 2 × 2; B = 2 × 4; C = 4 × 2; D = 4 × 4; E = 8 × 4; F = 4 × 8; G = 8 × 8; H = 16 × 16; I = 32 × 32

Related work

- 5.1 VARIANCE REDUCTION IN GANS

Researchers have proposed reducing variance in GAN training from an optimization perspective, by directly changing the way each of the networks are optimized. Some have proposed applying the extragradient method (Chavdarova et al, 2019), and others have proposed casting the minimax twoplayer game as a variational-inequality problem (Gidel et al, 2018). Brock et al (2018) recently proposed to reduce variance directly by using large mini-batch sizes.

5.2 STABILITY IN GAN TRAINING

Stabilizing GANs has been extensively studied theoretically. Researchers have worked on improving the dynamics of the two player minimax game in a variety of ways (Nagarajan & Kolter, 2017; Mescheder et al, 2018; Mescheder, 2018; Li et al, 2017b; Arora et al, 2017). Training instability has been linked to the architectural properties of GANs: especially to the discriminator (Miyato et al, 2018). Proposed architectural stabilization techniques include using Convolutional Neural Networks (CNNs) (Radford et al, 2015), using very large batch sizes (Brock et al, 2018), using an ensemble of the discriminators (Durugkar et al, 2016), using spectral normalization for the discriminator (Miyato et al, 2018), adding self-attention layers for the generator and discriminator networks (Vaswani et al, 2017; Zhang et al, 2018) and using iterative updates to a global generator and discriminator using an ensemble of paired generators and discriminators (Chavdarova & Fleuret, 2018). Different objectives have also been proposed to stabilize GAN training (Arjovsky et al, 2017; Gulrajani et al, 2017; Li et al, 2017a; Mao et al, 2017; Mroueh & Sercu, 2017; Bellemare et al, 2017).

Reference

- Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1–30, 2005.
- Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
- Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 224–232. JMLR. org, 2017.
- Sanjeev Arora, Andrej Risteski, and Yi Zhang. Do gans learn the distribution? some theory and empirics. 2018.
- Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, and Augustus Odena. Discriminator rejection sampling. arXiv preprint arXiv:1810.06758, 2018.
- Mihai Badoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257. ACM, 2002.
- Francisco Barahona and FabiaN A Chudak. Near-optimal solutions to large-scale facility location problems. Discrete Optimization, 2(1):35–50, 2005.
- Marc G Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, and Remi Munos. The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.
- Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.
- Tatjana Chavdarova and Francois Fleuret. Sgan: An alternative training of generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407–9415, 2018.
- Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. arXiv preprint arXiv:1904.08598, 2019.
- Kenneth L Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 2010.
- Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.
- David L Donoho et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS math challenges lecture, 1(2000):32, 2000.
- Ishan Durugkar, Ian Gemp, and Sridhar Mahadevan. Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673, 2016.
- Ahmet M Eskicioglu and Paul S Fisher. Image quality measures and their performance. IEEE Transactions on communications, 43(12):2959–2965, 1995.
- Reza Zanjirani Farahani and Masoud Hekmatfar. Facility location: concepts, models, algorithms and case studies. Springer, 2009.
- William Fedus, Ian Goodfellow, and Andrew M Dai. Maskgan: better text generation via filling in the. arXiv preprint arXiv:1801.07736, 2018.
- Dan Feldman, Matthew Faulkner, and Andreas Krause. Scalable training of mixture models via coresets. In Advances in neural information processing systems, pp. 2142–2150, 2011.
- Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.
- Bernd Girod. What’s wrong with mean-squared error? Digital images and human vision, pp. 207–220, 1993.
- AJ Goldman. Optimal center location in simple networks. Transportation science, 5(2):212–221, 1971.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
- Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777, 2017.
- Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3–19, 2007.
- Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300. ACM, 2004.
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
- Jonathan Huggins, Trevor Campbell, and Tamara Broderick. Coresets for scalable bayesian logistic regression. In Advances in Neural Information Processing Systems, pp. 4080–4088, 2016.
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
- Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
- Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Rithesh Kumar, Anirudh Goyal, Aaron Courville, and Yoshua Bengio. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.
- Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun Kim, and Kuinam J Kim. A survey of deep learning-based network anomaly detection. Cluster Computing, pp. 1–13, 2017.
- Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
- Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. Mmd gan: Towards deeper understanding of moment matching network. In Advances in Neural Information Processing Systems, pp. 2203–2213, 2017a.
- Jerry Li, Aleksander Madry, John Peebles, and Ludwig Schmidt. Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884, 2017b.
- Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802, 2017.
- Lars Mescheder. On the convergence properties of gan training. arXiv preprint arXiv:1801.04406, 1:16, 2018.
- Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually converge? arXiv preprint arXiv:1801.04406, 2018.
- Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
- Youssef Mroueh and Tom Sercu. Fisher gan. In Advances in Neural Information Processing Systems, pp. 2513–2523, 2017.
- Ben Mussay, Samson Zhou, Vladimir Braverman, and Dan Feldman. On activation function coresets for network pruning. arXiv preprint arXiv:1907.04018, 2019.
- Vaishnavh Nagarajan and J Zico Kolter. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pp. 5585–5595, 2017.
- Cuong V Nguyen, Yingzhen Li, Thang D Bui, and Richard E Turner. Variational continual learning. arXiv preprint arXiv:1710.10628, 2017.
- Augustus Odena. Open questions about generative adversarial networks. Distill, 2019. doi: 10. 23915/distill.00018. https://distill.pub/2019/gan-open-problems.
- Catherine Olsson, Surya Bhupatiraju, Tom Brown, Augustus Odena, and Ian Goodfellow. Skill rating for generative models. arXiv preprint arXiv:1808.04888, 2018.
- Jeff M Phillips. Coresets and sketches. arXiv preprint arXiv:1601.00617, 2016.
- Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
- Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.
- Christopher J Shallue, Jaehoon Lee, Joe Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600, 2018.
- Samarth Sinha, Sayna Ebrahimi, and Trevor Darrell. Variational adversarial active learning. arXiv preprint arXiv:1904.00370, 2019.
- Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc V Le. Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Ivor W Tsang, James T Kwok, and Pak-Ming Cheung. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research, 6(Apr):363–392, 2005.
- Ivor W Tsang, Andras Kocsor, and James T Kwok. Simpler core vector machines with enclosing balls. In Proceedings of the 24th international conference on Machine learning, pp. 911–918. ACM, 2007.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
- Kai Wei, Yuzong Liu, Katrin Kirchhoff, and Jeff Bilmes. Using document summarization techniques for speech data subset selection. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 721–726, 2013.
- Laurence A Wolsey and George L Nemhauser. Integer and combinatorial optimization. John Wiley & Sons, 2014.
- Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5542–5551, 2018.
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pp. 3320–3328, 2014.
- Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, and Vijay Ramaseshan Chandrasekhar. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
- Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915, 2017.
- Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.
- Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency regularization for generative adversarial networks, 2019.
- Jia-Jie Zhu and Jose Bento. Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.

Tags

Comments