# Extra-gradient with player sampling for faster convergence in n-player games

ICML 2020, 2020.

Keywords:

Learning with opponent-learning awareness”stochastic extra-gradientextra gradientmulti agentnoisy gradientMore(12+)

Weibo:

Abstract:

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs. In this paper, we analyse a new extra-gradient method for Nash equilibrium finding, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits a bette...More

Code:

Data:

Introduction

- Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs.
- The authors' approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme.
- Most importantly, it allows to train faster and better GANs and mixtures of GANs. A growing number of models in machine learning require to optimize over multiple interacting objectives.
- These examples can be cast as games where players are parametrized modules that compete or cooperate to minimize their own objective functions

Highlights

- Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training generative adversarial network
- We propose a doubly-stochastic extra-gradient (DSEG) algorithm (§3.2) that updates the strategies of a subset of players, performing player sampling
- Doubly-stochastic extra-gradient always brings a benefit in the convergence constants (Fig. 2a-b), in particular for smooth noisy problems (Fig. 2a center, Fig. 2b left)
- Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017)
- Doubly-stochastic extra-gradient outperform stochastic extra-gradient for all learning rates; more importantly, higher learning rates can be used for doubly-stochastic extra-gradient, allowing for faster training
- We propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria

Methods

- Time; DSEG extrapolates and updates successive pairs alternating the 4-step updates from §5.2

Results

- Fig. 5 compares the result to full extra-gradient with uniform averaging
- It shows substantial improvements in FID, with results less sensitive to randomness.
- Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017).
- The authors' approach combine extrapolation with an alternated schedule
- It performs better than extrapolating with simultaneous updates.
- The authors compare the training curves of both SEG and DSEG in Fig. 5, for a range of learning rates.

Conclusion

- The authors propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria.
- According to the convergence results, updating and extrapolating random sets of players in extra-gradient brings speed-up in noisy and nonsmooth convex problems.
- Doubly-stochastic extra-gradient brings speed-ups in convex settings, especially with noisy gradients.
- It brings speed-ups and improve solutions when training non-convex GANs and mixtures of GANs, combining the benefits of alternation and extrapolation in adversarial training.
- The authors foresee interesting developments using player sampling in reinforcement learning: the policy gradients obtained using multi-agent actor critic methods (Lowe et al, 2017) are noisy estimates, a setting in which it is beneficial

Summary

## Introduction:

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs.- The authors' approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme.
- Most importantly, it allows to train faster and better GANs and mixtures of GANs. A growing number of models in machine learning require to optimize over multiple interacting objectives.
- These examples can be cast as games where players are parametrized modules that compete or cooperate to minimize their own objective functions
## Methods:

Time; DSEG extrapolates and updates successive pairs alternating the 4-step updates from §5.2## Results:

Fig. 5 compares the result to full extra-gradient with uniform averaging- It shows substantial improvements in FID, with results less sensitive to randomness.
- Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017).
- The authors' approach combine extrapolation with an alternated schedule
- It performs better than extrapolating with simultaneous updates.
- The authors compare the training curves of both SEG and DSEG in Fig. 5, for a range of learning rates.
## Conclusion:

The authors propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria.- According to the convergence results, updating and extrapolating random sets of players in extra-gradient brings speed-up in noisy and nonsmooth convex problems.
- Doubly-stochastic extra-gradient brings speed-ups in convex settings, especially with noisy gradients.
- It brings speed-ups and improve solutions when training non-convex GANs and mixtures of GANs, combining the benefits of alternation and extrapolation in adversarial training.
- The authors foresee interesting developments using player sampling in reinforcement learning: the policy gradients obtained using multi-agent actor critic methods (Lowe et al, 2017) are noisy estimates, a setting in which it is beneficial

- Table1: New and existing (<a class="ref-link" id="cJuditsky_et+al_2011_a" href="#rJuditsky_et+al_2011_a">Juditsky et al, 2011</a>) convergence rates for convex games, w.r.t. the number of gradient computations k. Doubly-stochastic extra-gradient (DSEG) multiplies the noise contribution by a factor α b/n, where b is the number of sampled players among n. G bounds the gradient norm. L: Lip. constant of losses’ gradient. σ2 bounds the gradient estimation noise. Ω: diameter of the param. space

Related work

- Extra-gradient method. In this paper, we focus on finding the Nash equilibrium in convex n-player games, or equivalently the Variational Inequality problem (Harker & Pang, 1990; Nemirovski et al, 2010). This can be done using extrapolated gradient (Korpelevich, 1976), a “cautious” gradient descent approach that was promoted by Nemirovski (2004) and Nesterov (2007), under the name mirror-prox—

we review this work in §3.1. Juditsky et al (2011) propose a stochastic variant of mirror-prox, that assumes access to a noisy gradient oracle. In the convex setting, their results guarantees the convergence of the algorithm we propose, albeit with very slack rates. Our theoretical analysis refines these rates to show the usefulness of player sampling. Recently, Bach & Levy (2019) described a smoothnessadaptive variant of this algorithm similar to AdaGrad (Duchi et al, 2011), an approach that can be combined with ours. Yousefian et al (2018) consider multi-agent games on networks and analyze a stochastic variant of extra-gradient that consists in randomly extrapolating and updating a single player. Compared to them, we analyse more general player sampling strategies. Moreover, our analysis holds for nonsmooth losses, and provides better rates for smooth losses, through variance reduction. We also analyse precisely the reasons why player sampling is useful (see discussion in §4), an original endeavor.

Reference

- Bach, F. and Levy, K. A universal algorithm for variational inequalities adaptive to smoothness and noise. In Proceedings of the Conference on Learning Theory, 2019.
- Balduzzi, D., Racanière, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. The mechanics of n-player differentiable games. In Proceedings of the International Conference on Machine Learning, 2018.
- Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, pp. 161–168, 2008.
- Bu, L., Babu, R., De Schutter, B., et al. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
- Bubeck, S. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4): 231–357, 2015.
- Chambolle, A. and Pock, T. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120– 145, 2011.
- Chavdarova, T., Gidel, G., Fleuret, F., Foo, C.-S., and Lacoste-Julien, S. Reducing noise in GAN training with variance reduced extragradient. In Advances in Neural Information Processing Systems, 2019.
- Cheung, Y. K. and Piliouras, G. Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games. In Proceedings of the Conference on Learning Theory, 2019.
- Defazio, A., Bach, F., and Lacoste-Julien, S. SAGA: A fast incremental gradient method with support for nonstrongly convex composite objectives. In Advances in Neural Information Processing Systems, pp. 1646–1654, 2014.
- Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
- Foerster, J., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., and Mordatch, I. Learning with opponentlearning awareness. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems, 2018.
- Gelfand, I. Normierte ringe. Matematicheskii Sbornik, 9(1): 3–24, 1941.
- Ghosh, A., Kulharia, V., Namboodiri, V., Torr, P. H. S., and Dokania, P. K. Multi-agent diverse generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Gidel, G., Berard, H., Vignoud, G., Vincent, P., and LacosteJulien, S. A variational inequality perspective on generative adversarial networks. In International Conference on Learning Representations, 2019.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
- Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems, pp. 5767–5777, 2017.
- Harker, P. T. and Pang, J.-S. Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Mathematical Programming, 48(1-3):161–220, 1990.
- He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
- Iusem, A., Jofré, A., Oliveira, R. I., and Thompson, P. Extragradient method with variance reduction for stochastic variational inequalities. SIAM Journal on Optimization, 27(2):686–724, 2017.
- Juditsky, A., Nemirovski, A., and Tauvel, C. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
- Kim, S.-J., Magnani, A., and Boyd, S. Robust Fisher discriminant analysis. In Advances in Neural Information Processing Systems, 2006.
- Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015.
- Korpelevich, G. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
- Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Letcher, A., Foerster, J., Balduzzi, D., Rocktäschel, T., and Whiteson, S. Stable opponent shaping in differentiable games. In International Conference on Learning Representations, 2019.
- Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
- Magnanti, T. L. and Perakis, G. Averaging schemes for variational inequalities and systems of equations. Mathematics of Operations Research, 22(3):568–587, 1997.
- Mazumdar, E. V., Jordan, M. I., and Sastry, S. S. On finding local Nash equilibria (and only local Nash equilibria) in zero-sum games. arXiv:1901.00838, 2019.
- Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.-S., Chandrasekhar, V., and Piliouras, G. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In International Conference on Learning Representations, 2019.
- Mescheder, L., Nowozin, S., and Geiger, A. The numerics of GANs. In Advances in Neural Information Processing Systems, pp. 1825–1835, 2017.
- Nash, J. Non-cooperative games. Annals of Mathematics, pp. 286–295, 1951.
- Nedic, A. and Ozdaglar, A. Subgradient methods for saddlepoint problems. Journal of Optimization Theory and Applications, 142(1):205–228, 2009.
- Nemirovski, A. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- Nemirovski, A., Onn, S., and Rothblum, U. G. Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1):52–78, 2010.
- Nemirovsky, A. S. and Yudin, D. B. Problem complexity and method efficiency in optimization. Wiley, 1983.
- Palaniappan, B. and Bach, F. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pp. 1416–1424, 2016.
- Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., et al. Imagination-augmented agents for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 5690–5701, 2017.
- Robbins, H. and Monro, S. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3): 400–407, 1951.
- Rockafellar, R. T. Monotone operators associated with saddle-functions and minimax problems. In Proceedings of Symposia in Pure Mathematics, volume 18.1, pp. 241– 250, 1970.
- Rosen, J. B. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica, 33(3):520– 534, 1965.
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.
- Wayne, G. and Abbott, L. Hierarchical control using networks trained with higher-level forward models. Neural Computation, 26(10):2163–2193, 2014.
- Yousefian, F., Nedic, A., and Shanbhag, U. V. On stochastic mirror-prox algorithms for stochastic cartesian variational inequalities: Randomized block coordinate and optimal averaging schemes. Set-Valued and Variational Analysis, 26(4):789–819, 2018.
- Zhang, C. and Lesser, V. Multi-agent learning with policy prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010.
- Nesterov, Y. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319–344, 2007.
- Nikaidô, H. and Isoda, K. Note on non-cooperative convex games. Pacific Journal of Mathematics, 5(Suppl. 1):807– 815, 1955.

Tags

Comments