How to Train Your DRAGAN
CoRR, Volume abs/1705.07215, 2017.
equilibriadynamicsbest-response dynamicsarea under the curvesimultaneous gradient descentMore(15+)
For the set of randomized architectures, we perform qualitative and quantitative assessment ranking of our algorithm compared to the vanilla Generative Adversarial Networks algorithm to assess improvements in stability, i.e. performance characteristics related to mode collapse or...
Generative Adversarial Networks have emerged as an effective technique for estimating data distributions. The basic setup consists of two deep networks playing against each other in a zero-sum game setting. However, it is not understood if the networks reach an equilibrium eventually and what dynamics makes this possible. The current GAN ...More
- Generative modeling involves using a training set to learn probability distribution Pmodel that closely resembles the data generating distribution Preal.
- The common practice in the community is to use simultaneous gradient descent to try and converge to this point.
- This procedure lacks a clear game theoretical justification from the literature and the original paper argues that ideally players should be trained to optimality at each step
- Generative modeling involves using a training set to learn probability distribution Pmodel that closely resembles the data generating distribution Preal
- In the present section we describe how the training procedure for Generative Adversarial Networks in general can be viewed through the lens of equilibrium computation in zero-sum convex/concave games
- For the set of randomized architectures, we perform qualitative and quantitative assessment ranking of our algorithm compared to the vanilla Generative Adversarial Networks algorithm to assess improvements in stability, i.e. performance characteristics related to mode collapse or failure to learn. 5.1 Inception Scores on CIFAR-10 using deep convolutional generative adversarial networks architecture The deep convolutional generative adversarial networks is a family of architectures designed to perform well with the vanilla training procedure 
- We connect the idea of regret minimization to Generative Adversarial Networks and provided a novel way to reason about dynamics in this game
- We analyze mode collapse from a game-theoretic perspective and hypothesize that spurious local equilibria are responsible for this issue
- We propose a novel regularization scheme as part of our algorithm DRAGAN
- The authors present a series of experiments using the MNIST, CIFAR-10, and CelebA datasets demonstrating competitive inception score  results and sample quality compared to the baseline algorithms.
- 5.1 Inception Scores on CIFAR-10 using DCGAN architecture The DCGAN is a family of architectures designed to perform well with the vanilla training procedure 
- They are ubiquitous in the GAN literature owing to the instability of the vanilla algorithm in general settings.
- The authors use this architecture to model the CIFAR-10 dataset and compare to the vanilla GAN, WGAN, and improved WGAN.
- Improved WGAN achieves a significantly (a) Inception Score Plot (c) Vanilla GAN
- The authors draw upon the game theory literature to justify the current GAN training procedure and propose an improved algorithm.
- The authors connect the idea of regret minimization to GANs and provided a novel way to reason about dynamics in this game.
- Given this background, the authors analyze mode collapse from a game-theoretic perspective and hypothesize that spurious local equilibria are responsible for this issue.
- The authors' algorithm is simple to implement, fast, and improves stability in a wide variety of settings
- Leveraging the analysis of no-regret algorithms in convex games, we motivate the use of simultaneous gradient descent in GAN training.
- We hypothesize that the difficulty in training GANs, especially due to mode collapse, results from the existence of spurious local Nash Equilibria in non-convex games .
- We propose a new algorithm (DRAGAN) that performs smoothing of the discriminator function by constraining its gradients around the real samples.
- In the present section we describe how the training procedure for GANs in general can be viewed through the lens of equilibrium computation in zero-sum convex/concave games.
- Ignoring the case of constraint violations, OGD can be written in a simple iterative form: θt = θt−1 − η∇Lt. the min/max objective function in GANs involves a stochastic payoff function, with two randomized inputs given on each round, x and z which are sampled from the data distribution and a standard multivariate normal, respectively.
- . due to the non-convex nature of GAN settings, we are not guaranteed that a sole Nash equilibrium exists – there may be many non-optimal saddle points, which are essentially local minima of the game.
- We believe the following conjecture accounts for the dramatic reduction in mode collapse: the local norm-1 gradient regularization penalty provides sufficient smoothing of the game payoff to significantly reduce the space of non-optimal saddle points.
- We present a series of experiments using the MNIST, CIFAR-10, and CelebA datasets demonstrating competitive inception score  results and sample quality compared to the baseline algorithms.
- For the set of randomized architectures, we perform qualitative and quantitative assessment ranking of our algorithm compared to the vanilla GAN algorithm to assess improvements in stability, i.e. performance characteristics related to mode collapse or failure to learn.
- To measure this to a larger extent, we introduce a metric termed the BogoNet score to compare stability of different training procedures in the GAN setting.
- The basic idea is to choose random architectures for players G and D independently and evaluate the performance of different algorithms in the resulting games.
- Both the qualitative and the quantitative analysis demonstrate that we achieve more stable performance compared to the vanilla procedure and solve stability issues to some extent.
- We draw upon the game theory literature to justify the current GAN training procedure and propose an improved algorithm.
- Our algorithm is simple to implement, fast, and improves stability in a wide variety of settings
- Table1: Summary of inception score statistics across 100 architectures
- There have been several works aimed at finding a stable way to train GANs. Radford et al  proposed a stable family of architectures called deep convolutional generative adversarial networks (DCGANs). We show that such constraints on architectures can be relaxed while still being able to achieve stability in the training process. In an alternate direction, a number of works have focused on developing specific objective functions that improve stability and performance of GANs. Salimans et al  introduced a variety of techniques to improve the quality of samples. Che et al  proposed a family of regularizers to address the missing modes problem in GANs. Zhao et al  introduced energy based GAN framework which is more stable to train. Metz et al  developed unrolled GANs taking inspiration from game theory literature. However, it suffers from slow performance due to the requirement of multiple unrolling steps in each iteration. Recently, we have seen a series of works based on imposing a Lipschitz constraint on the discriminator function. Guo-Jun Qi  introduced LS-GAN with the idea of maintaining a margin between losses assigned to real and fake samples. Specifically, they enforce the following condition (the discriminator is used as the loss function in this setting) -
- Introduces regret minimization as a technique to reach equilibrium in games and use this to justify the success of simultaneous GD in GANs
- Develops an algorithm called DRAGAN that is fast, simple to implement and achieves competitive performance in a stable fashion across different architectures , datasets , and divergence measures with almost no hyperparameter tuning
- Shows significant improvements over the recently proposed Wasserstein GAN variants
- Introduces regret-minimization as a technique to reach Nash Equilibrium in games
- Proposes a new algorithm that performs smoothing of the discriminator function by constraining its gradients around the real samples
- J. von Neumann. “Zur Theorie der Gesellschaftsspiele”. In: Mathematische Annalen 100 (1928), pp. 295–320. URL: http://eudml.org/doc/159291.
- John Nash. “Two-person cooperative games”. In: Econometrica: Journal of the Econometric Society (1953), pp. 128–140.
- Maurice Sion. “On general minimax theorems”. In: Pacific J. Math 8.1 (1958), pp. 171–176.
- Martin Zinkevich. “Online convex programming and generalized infinitesimal gradient ascent”.
- Nicolo Cesa-Bianchi, Alex Conconi, and Claudio Gentile. “On the generalization ability of on-line learning algorithms”. In: IEEE Transactions on Information Theory 50.9 (2004), pp. 2050–2057.
- Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
- Noam Nisan et al. Algorithmic game theory. Vol. 1. Cambridge University Press Cambridge, 2007.
- Lillian J Ratliff, Samuel A Burden, and S Shankar Sastry. “Characterization and computation of local nash equilibria in continuous games”. In: Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on. IEEE. 2013, pp. 917–924.
- Ian Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Information Processing Systems 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, pp. 2672– 2680. URL: http://papers.nips.cc/paper/5423-generative-adversarial nets.pdf.
- Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
- Alec Radford, Luke Metz, and Soumith Chintala. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”. In: arXiv:1511.06434 [cs] (Nov. 2015). arXiv: 1511.06434. URL: http://arxiv.org/abs/1511.06434 (visited on 05/16/2017).
- Tong Che et al. “Mode Regularized Generative Adversarial Networks”. In: arXiv preprint arXiv:1612.02136 (2016).
- Ian Goodfellow. “NIPS 2016 Tutorial: Generative Adversarial Networks”. In: arXiv preprint arXiv:1701.00160 (2016).
- Luke Metz et al. “Unrolled Generative Adversarial Networks”. In: CoRR abs/1611.02163 (2016). URL: http://arxiv.org/abs/1611.02163.
- Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. “f-GAN: Training generative neural samplers using variational divergence minimization”. In: Advances in Neural Information Processing Systems. 2016, pp. 271–279.
- Tim Salimans et al. “Improved Techniques for Training GANs”. In: CoRR abs/1606.03498 (2016). URL: http://arxiv.org/abs/1606.03498.
- Junbo Zhao, Michael Mathieu, and Yann LeCun. “Energy-based generative adversarial network”. In: arXiv preprint arXiv:1609.03126 (2016).
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. “Wasserstein gan”. In: arXiv preprint arXiv:1701.07875 (2017).
- Ishaan Gulrajani et al. “Improved Training of Wasserstein GANs”. In: arXiv preprint arXiv:1704.00028 (2017).
- Guo-Jun Qi. “Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities”. In: CoRR abs/1701.06264 (2017). URL: http://arxiv.org/abs/1701.06264.
- 1. Use all convolutional networks which learn their own spatial downsampling (discriminator) or upsampling (generator)
- 2. Remove fully connected hidden layers for deeper architectures 3. Use batch normalization in both the generator and the discriminator 4. Use ReLU activation in the generator for all layers except the output layer, which uses tanh
- 5. Use LeakyReLU activation in the discriminator for all layers We show that such constraints can be relaxed for our algorithm and hence, practitioners are free to choose from a more diverse set of architectures. Below, we present a series of experiments in which we remove different stabilizing components from DCGAN architecture and analyze the performance of our algorithm. In each case, our algorithm is stable while the vanilla procedure fails. A similar approach is used to establish the robustness of training procedures in [3, 4]. Note that we add layer normalization to the discriminator (in our architecture experiments) in place of batch normalization. We chose the following four architectures which are difficult to train (in each case, we start with base DCGAN architecture and apply the changes) -