# Fast computation of Nash Equilibria in Imperfect Information Games

ICML, pp. 7119-7129, 2020.

EI

Keywords:

Weibo:

Abstract:

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential form with imperfect information. These algorithms update the policy of each player with a mirror-ascent step to maximize the value of pl...More

Code:

Data:

Introduction

- This paper considers the problem of computing a Nash equilibrium for two-player zero-sum games in two types of games: normal-form games and imperfect information games (IIGs) in extensive form.
- By that the authors mean that some weighted 2 distance between the policies produced by the algorithm and the set of Nash equilibria decreases as O(exp(−βt)), for some problem-dependent constant β > 0, where t is the number of iterations of the algorithm.
- The authors' analysis shows that the speed of convergence to the set of Nash equilibria depends on a measure of how much each player is able to improve its own policy against a fixed opponent.
- The authors' analysis shows convergence for all such cases, which opens new avenues for designing algorithms with convergence guarantees, while offering a trade-off in terms of computational cost versus convergence speed toward the Nash equilibrium

Highlights

- This paper considers the problem of computing a Nash equilibrium for two-player zero-sum games in two types of games: normal-form games and imperfect information games (IIGs) in extensive form
- We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), which updates the policy of each player by following a step of mirror-ascent for maximizing its expected reward against an improved policy for the opponent
- Examples of improved polices are the greedy policy, a multi-step improved policy, such as in Monte Carlo Tree Search (MCTS), a policy improved by policy gradient, or by any other reinforcement learning or search algorithm
- We introduced a new class of algorithms for computing a Nash equilibrium in zero-sum normal form games and sequential information games and provided an analysis of the speed of convergence in terms of the notion of improvement
- We show a new tradeoff between computational complexity of computing improved policies and speed of convergence to the set of Nash eq Under some condition exponential convergence is achieved when we use the best response as improved policy
- Maybe the main contribution of Mirror Ascent against an Improved Opponent is that it offers a principled approach to use any reinforcement learning policy improvement technique to generate a sequence of policies with convergence guarantee to the set of Nash equilibria

Conclusion

- The authors introduced a new class of algorithms for computing a Nash equilibrium in zero-sum normal form games and sequential IIGs and provided an analysis of the speed of convergence in terms of the notion of improvement.
- The authors observe the exponential convergence with a rate that depends on ε (Fig. 1(a)) and the constant c (Fig. 1(b)) used in the learning rate (i.e., the authors chose ηt = c · I).
- This is exactly what is p√redicted by the theory since the value of κ in Lemma 1 is ε/ 2 here

Summary

## Introduction:

This paper considers the problem of computing a Nash equilibrium for two-player zero-sum games in two types of games: normal-form games and imperfect information games (IIGs) in extensive form.- By that the authors mean that some weighted 2 distance between the policies produced by the algorithm and the set of Nash equilibria decreases as O(exp(−βt)), for some problem-dependent constant β > 0, where t is the number of iterations of the algorithm.
- The authors' analysis shows that the speed of convergence to the set of Nash equilibria depends on a measure of how much each player is able to improve its own policy against a fixed opponent.
- The authors' analysis shows convergence for all such cases, which opens new avenues for designing algorithms with convergence guarantees, while offering a trade-off in terms of computational cost versus convergence speed toward the Nash equilibrium
## Conclusion:

The authors introduced a new class of algorithms for computing a Nash equilibrium in zero-sum normal form games and sequential IIGs and provided an analysis of the speed of convergence in terms of the notion of improvement.- The authors observe the exponential convergence with a rate that depends on ε (Fig. 1(a)) and the constant c (Fig. 1(b)) used in the learning rate (i.e., the authors chose ηt = c · I).
- This is exactly what is p√redicted by the theory since the value of κ in Lemma 1 is ε/ 2 here

Reference

- Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357.
- Chen, Y. and Ye, X. (2011). Projection onto a simplex. arXiv preprint arXiv:1101.6081.
- Csiszar, I. and Korner, J. (1982). Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, Inc.
- Daskalakis, C., Deckelbaum, A., and Kim, A. (2011). Nearoptimal no-regret algorithms for zero-sum games. In ACM-SIAM Symposium on Discrete Algorithms (SODA).
- Daskalakis, C. and Panageas, I. (2018). Last-iterate convergence: Zero-sum games and constrained min-max optimization. arXiv.
- Gidel, G., Jebara, T., and Lacoste-Julien, S. (2016). Frankwolfe algorithms for saddle point problems. In Artificial Intelligence and Statistics (AISTATS).
- Gilpin, A., Hoda, S., Pena, J., and Sandholm, T. (2007). Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Web and Internet Economics.
- Gilpin, A., Pena, J., and Sandholm, T. (2012). First-order algorithm with O(ln(1/ε)) convergence for ε-equilibrium in two-person zero-sum games. Mathematical programming, 133(1-2):279–298.
- Gilpin, A., Pena, J., and Sandholm, T. W. (2008). First-order algorithm with O(ln(1/ε)) convergence for equilibrium in two-person zero-sum games. In AAAI Conference on Artificial Intelligence.
- Heinrich, J., Lanctot, M., and Silver, D. (2015). Fictitious self-play in extensive-form games. In International Conference on Machine Learning (ICML).
- Heinrich, J. and Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv.
- Hoda, S., Gilpin, A., Pena, J., and Sandholm, T. (2010). Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):494–512.
- Johanson, M., Bard, N., Burch, N., and Bowling, M. (2012). Finding optimal abstract strategies in extensive form games. In AAAI Conference on Artificial Intelligence.
- Kangarshahi, E. A., Hsieh, Y.-P., Sahin, M. F., and Cevher, V. (2018). Let’s be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML).
- Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In ACM Symposium on Theory of Computing (STOC).
- Khachiyan, L. (1980). Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 20(1):53 – 72.
- Koller, D., Megiddo, N., and von Stengel, B. (1994). Efficient solutions of extensive two-person games. In ACM Symposium on the Theory of Computing (STOC).
- Koller, D. and Pfeffer, A. (1997). Representations and solutions for game-theoretic problems. Artificial intelligence, 94(1-2):167–215.
- Korpelevich, G. (1976). The extragradient method for finding saddle points and other problems. Matecon, 12:747– 756.
- Kroer, C., Farina, G., and Sandholm, T. (2018). Solving large sequential games with the excessive gap technique. In Neural Information Processing Systems (NeurIPS).
- Kuhn, H. W. (1950). A simplified two-person poker. Contributions to the Theory of Games, 1:97–103.
- Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V., Upadhyay, S., Perolat, J., Srinivasan, S., Timbers, F., Tuyls, K., Omidshafiei, S., Hennes, D., Morrill, D., Muller, P., Ewalds, T., Faulkner, R., Kramar, J., Vylder, B. D., Saeta, B., Bradbury, J., Ding, D., Borgeaud, S., Lai, M., Schrittwieser, J., Anthony, T., Hughes, E., Danihelka, I., and Ryan-Davis, J. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv.
- Lattimore, T. and Szepesvari, C. (2020). Bandit Algorithms. Cambridge University Press.
- Lockhart, E., Lanctot, M., Perolat, J., Lespiau, J.-B., Morrill, D., Timbers, F., and Tuyls, K. (2019). Computing approximate equilibria in sequential adversarial games by exploitability descent. In International Joint Conference on Artificial Intelligence (IJCAI).
- Martins, A. F. T. and Astudillo, R. F. (2016). From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning (ICML).
- Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C., Chandrasekhar, V., and Piliouras, G. (2019). Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In International Conference on Learning Representations (ICLR).
- Mertikopoulos, P., Papadimitriou, C., and Piliouras, G. (2018). Cycles in adversarial regularized learning. In ACM-SIAM Symposium on Discrete Algorithms (SODA).
- Mokhtari, A., Ozdaglar, A., and Pattathil, S. (2020). A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. Artificial Intelligence and Statistics (AISTATS).
- Mordukhovich, B. S., Pena, J. F., and Roshchina, V. (2010). Applying metric regularity to compute a condition measure of a smoothing algorithm for matrix games. SIAM Journal on Optimization, 20(6):3490–3511.
- Nemirovski, A. and Yudin, D. (1983). Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics.
- Nesterov, Y. (2005). Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235–249.
- Nesterov, Y. E. and Todd, M. J. (1998). Primal-dual interiorpoint methods for self-scaled cones. SIAM Journal on Optimization, 8(2):324–364.
- Neumann, J. v. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische annalen, 100(1):295–320.
- Ponsen, M. J. V., de Jong, S., and Lanctot, M. (2011). Computing approximate Nash equilibria and robust bestresponses using sampling. J. Artif. Intell. Res., 42:575– 605.
- Rakhlin, S. and Sridharan, K. (2013). Optimization, learning, and games with predictable sequences. In Neural Information Processing Systems (NIPS).
- Schneider, R. (2014). Convex bodies: The BrunnMinkowski theory. Encyclopedia of Mathematics and its Applications, 1(151).
- Syrgkanis, V., Agarwal, A., Luo, H., and Schapire, R. E. (2015). Fast convergence of regularized learning in games. In Neural Information Processing Systems (NIPS).
- Tammelin, O., Burch, N., Johanson, M., and Bowling, M. (2015). Solving heads-up limit Texas Hold’em. In International Joint Conference on Artificial Intelligence (IJCAI).
- Von Stengel, B. (1996). Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220– 246.
- Zinkevich, M., Johanson, M., Bowling, M., and Piccione, C. (2008). Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS 2007).

Tags

Comments