Extra-gradient with player sampling for faster convergence in n-player games

Samy Jelassi
Samy Jelassi
Carles Domingo-Enrich
Carles Domingo-Enrich
Damien Scieur
Damien Scieur
Arthur Mensch
Arthur Mensch
Joan Bruna
Joan Bruna

ICML 2020, 2020.

Cited by: 0|Bibtex|Views9|Links
Keywords:
Learning with opponent-learning awareness”stochastic extra-gradientextra gradientmulti agentnoisy gradientMore(12+)
Weibo:
We propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria

Abstract:

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs. In this paper, we analyse a new extra-gradient method for Nash equilibrium finding, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits a bette...More

Code:

Data:

0
Introduction
  • Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs.
  • The authors' approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme.
  • Most importantly, it allows to train faster and better GANs and mixtures of GANs. A growing number of models in machine learning require to optimize over multiple interacting objectives.
  • These examples can be cast as games where players are parametrized modules that compete or cooperate to minimize their own objective functions
Highlights
  • Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training generative adversarial network
  • We propose a doubly-stochastic extra-gradient (DSEG) algorithm (§3.2) that updates the strategies of a subset of players, performing player sampling
  • Doubly-stochastic extra-gradient always brings a benefit in the convergence constants (Fig. 2a-b), in particular for smooth noisy problems (Fig. 2a center, Fig. 2b left)
  • Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017)
  • Doubly-stochastic extra-gradient outperform stochastic extra-gradient for all learning rates; more importantly, higher learning rates can be used for doubly-stochastic extra-gradient, allowing for faster training
  • We propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria
Methods
  • Time; DSEG extrapolates and updates successive pairs alternating the 4-step updates from §5.2
Results
  • Fig. 5 compares the result to full extra-gradient with uniform averaging
  • It shows substantial improvements in FID, with results less sensitive to randomness.
  • Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017).
  • The authors' approach combine extrapolation with an alternated schedule
  • It performs better than extrapolating with simultaneous updates.
  • The authors compare the training curves of both SEG and DSEG in Fig. 5, for a range of learning rates.
Conclusion
  • The authors propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria.
  • According to the convergence results, updating and extrapolating random sets of players in extra-gradient brings speed-up in noisy and nonsmooth convex problems.
  • Doubly-stochastic extra-gradient brings speed-ups in convex settings, especially with noisy gradients.
  • It brings speed-ups and improve solutions when training non-convex GANs and mixtures of GANs, combining the benefits of alternation and extrapolation in adversarial training.
  • The authors foresee interesting developments using player sampling in reinforcement learning: the policy gradients obtained using multi-agent actor critic methods (Lowe et al, 2017) are noisy estimates, a setting in which it is beneficial
Summary
  • Introduction:

    Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs.
  • The authors' approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme.
  • Most importantly, it allows to train faster and better GANs and mixtures of GANs. A growing number of models in machine learning require to optimize over multiple interacting objectives.
  • These examples can be cast as games where players are parametrized modules that compete or cooperate to minimize their own objective functions
  • Methods:

    Time; DSEG extrapolates and updates successive pairs alternating the 4-step updates from §5.2
  • Results:

    Fig. 5 compares the result to full extra-gradient with uniform averaging
  • It shows substantial improvements in FID, with results less sensitive to randomness.
  • Without extrapolation, alternated training is known to perform better than simultaneous updates in WGAN-GP (Gulrajani et al, 2017).
  • The authors' approach combine extrapolation with an alternated schedule
  • It performs better than extrapolating with simultaneous updates.
  • The authors compare the training curves of both SEG and DSEG in Fig. 5, for a range of learning rates.
  • Conclusion:

    The authors propose and analyse a doubly-stochastic extra-gradient approach for finding Nash equilibria.
  • According to the convergence results, updating and extrapolating random sets of players in extra-gradient brings speed-up in noisy and nonsmooth convex problems.
  • Doubly-stochastic extra-gradient brings speed-ups in convex settings, especially with noisy gradients.
  • It brings speed-ups and improve solutions when training non-convex GANs and mixtures of GANs, combining the benefits of alternation and extrapolation in adversarial training.
  • The authors foresee interesting developments using player sampling in reinforcement learning: the policy gradients obtained using multi-agent actor critic methods (Lowe et al, 2017) are noisy estimates, a setting in which it is beneficial
Tables
  • Table1: New and existing (<a class="ref-link" id="cJuditsky_et+al_2011_a" href="#rJuditsky_et+al_2011_a">Juditsky et al, 2011</a>) convergence rates for convex games, w.r.t. the number of gradient computations k. Doubly-stochastic extra-gradient (DSEG) multiplies the noise contribution by a factor α b/n, where b is the number of sampled players among n. G bounds the gradient norm. L: Lip. constant of losses’ gradient. σ2 bounds the gradient estimation noise. Ω: diameter of the param. space
Download tables as Excel
Related work
  • Extra-gradient method. In this paper, we focus on finding the Nash equilibrium in convex n-player games, or equivalently the Variational Inequality problem (Harker & Pang, 1990; Nemirovski et al, 2010). This can be done using extrapolated gradient (Korpelevich, 1976), a “cautious” gradient descent approach that was promoted by Nemirovski (2004) and Nesterov (2007), under the name mirror-prox—

    we review this work in §3.1. Juditsky et al (2011) propose a stochastic variant of mirror-prox, that assumes access to a noisy gradient oracle. In the convex setting, their results guarantees the convergence of the algorithm we propose, albeit with very slack rates. Our theoretical analysis refines these rates to show the usefulness of player sampling. Recently, Bach & Levy (2019) described a smoothnessadaptive variant of this algorithm similar to AdaGrad (Duchi et al, 2011), an approach that can be combined with ours. Yousefian et al (2018) consider multi-agent games on networks and analyze a stochastic variant of extra-gradient that consists in randomly extrapolating and updating a single player. Compared to them, we analyse more general player sampling strategies. Moreover, our analysis holds for nonsmooth losses, and provides better rates for smooth losses, through variance reduction. We also analyse precisely the reasons why player sampling is useful (see discussion in §4), an original endeavor.
Reference
  • Bach, F. and Levy, K. A universal algorithm for variational inequalities adaptive to smoothness and noise. In Proceedings of the Conference on Learning Theory, 2019.
    Google ScholarLocate open access versionFindings
  • Balduzzi, D., Racanière, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. The mechanics of n-player differentiable games. In Proceedings of the International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, pp. 161–168, 2008.
    Google ScholarLocate open access versionFindings
  • Bu, L., Babu, R., De Schutter, B., et al. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
    Google ScholarLocate open access versionFindings
  • Bubeck, S. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4): 231–357, 2015.
    Google ScholarLocate open access versionFindings
  • Chambolle, A. and Pock, T. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120– 145, 2011.
    Google ScholarLocate open access versionFindings
  • Chavdarova, T., Gidel, G., Fleuret, F., Foo, C.-S., and Lacoste-Julien, S. Reducing noise in GAN training with variance reduced extragradient. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Cheung, Y. K. and Piliouras, G. Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games. In Proceedings of the Conference on Learning Theory, 2019.
    Google ScholarLocate open access versionFindings
  • Defazio, A., Bach, F., and Lacoste-Julien, S. SAGA: A fast incremental gradient method with support for nonstrongly convex composite objectives. In Advances in Neural Information Processing Systems, pp. 1646–1654, 2014.
    Google ScholarLocate open access versionFindings
  • Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
    Google ScholarLocate open access versionFindings
  • Foerster, J., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., and Mordatch, I. Learning with opponentlearning awareness. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Gelfand, I. Normierte ringe. Matematicheskii Sbornik, 9(1): 3–24, 1941.
    Google ScholarLocate open access versionFindings
  • Ghosh, A., Kulharia, V., Namboodiri, V., Torr, P. H. S., and Dokania, P. K. Multi-agent diverse generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    Google ScholarLocate open access versionFindings
  • Gidel, G., Berard, H., Vignoud, G., Vincent, P., and LacosteJulien, S. A variational inequality perspective on generative adversarial networks. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems, pp. 5767–5777, 2017.
    Google ScholarLocate open access versionFindings
  • Harker, P. T. and Pang, J.-S. Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Mathematical Programming, 48(1-3):161–220, 1990.
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
    Google ScholarLocate open access versionFindings
  • Iusem, A., Jofré, A., Oliveira, R. I., and Thompson, P. Extragradient method with variance reduction for stochastic variational inequalities. SIAM Journal on Optimization, 27(2):686–724, 2017.
    Google ScholarLocate open access versionFindings
  • Juditsky, A., Nemirovski, A., and Tauvel, C. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
    Google ScholarLocate open access versionFindings
  • Kim, S.-J., Magnani, A., and Boyd, S. Robust Fisher discriminant analysis. In Advances in Neural Information Processing Systems, 2006.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Korpelevich, G. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Letcher, A., Foerster, J., Balduzzi, D., Rocktäschel, T., and Whiteson, S. Stable opponent shaping in differentiable games. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • Magnanti, T. L. and Perakis, G. Averaging schemes for variational inequalities and systems of equations. Mathematics of Operations Research, 22(3):568–587, 1997.
    Google ScholarLocate open access versionFindings
  • Mazumdar, E. V., Jordan, M. I., and Sastry, S. S. On finding local Nash equilibria (and only local Nash equilibria) in zero-sum games. arXiv:1901.00838, 2019.
    Findings
  • Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.-S., Chandrasekhar, V., and Piliouras, G. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Mescheder, L., Nowozin, S., and Geiger, A. The numerics of GANs. In Advances in Neural Information Processing Systems, pp. 1825–1835, 2017.
    Google ScholarLocate open access versionFindings
  • Nash, J. Non-cooperative games. Annals of Mathematics, pp. 286–295, 1951.
    Google ScholarLocate open access versionFindings
  • Nedic, A. and Ozdaglar, A. Subgradient methods for saddlepoint problems. Journal of Optimization Theory and Applications, 142(1):205–228, 2009.
    Google ScholarLocate open access versionFindings
  • Nemirovski, A. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
    Google ScholarLocate open access versionFindings
  • Nemirovski, A., Onn, S., and Rothblum, U. G. Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1):52–78, 2010.
    Google ScholarLocate open access versionFindings
  • Nemirovsky, A. S. and Yudin, D. B. Problem complexity and method efficiency in optimization. Wiley, 1983.
    Google ScholarFindings
  • Palaniappan, B. and Bach, F. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pp. 1416–1424, 2016.
    Google ScholarLocate open access versionFindings
  • Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., et al. Imagination-augmented agents for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 5690–5701, 2017.
    Google ScholarLocate open access versionFindings
  • Robbins, H. and Monro, S. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3): 400–407, 1951.
    Google ScholarLocate open access versionFindings
  • Rockafellar, R. T. Monotone operators associated with saddle-functions and minimax problems. In Proceedings of Symposia in Pure Mathematics, volume 18.1, pp. 241– 250, 1970.
    Google ScholarLocate open access versionFindings
  • Rosen, J. B. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica, 33(3):520– 534, 1965.
    Google ScholarLocate open access versionFindings
  • Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.
    Google ScholarLocate open access versionFindings
  • Wayne, G. and Abbott, L. Hierarchical control using networks trained with higher-level forward models. Neural Computation, 26(10):2163–2193, 2014.
    Google ScholarLocate open access versionFindings
  • Yousefian, F., Nedic, A., and Shanbhag, U. V. On stochastic mirror-prox algorithms for stochastic cartesian variational inequalities: Randomized block coordinate and optimal averaging schemes. Set-Valued and Variational Analysis, 26(4):789–819, 2018.
    Google ScholarLocate open access versionFindings
  • Zhang, C. and Lesser, V. Multi-agent learning with policy prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010.
    Google ScholarLocate open access versionFindings
  • Nesterov, Y. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319–344, 2007.
    Google ScholarLocate open access versionFindings
  • Nikaidô, H. and Isoda, K. Note on non-cooperative convex games. Pacific Journal of Mathematics, 5(Suppl. 1):807– 815, 1955.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments