What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?
ICML, pp. 4880-4889, 2019.
EI
Keywords:
Weibo:
Abstract:
Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks (GANs), adversarial training and multi-agent reinforcement learning. As most of these applications involve continuous nonconvex-nonconcave formulations, a very basic question arises---``what is a proper def...More
Code:
Data:
Introduction
- Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it.
- In the last few years, minimax optimization has found significant applications in machine learning, in settings such as generative adversarial networks (GAN) [Goodfellow et al, 2014], adversarial training [Madry et al, 2017] and multi-agent reinforcement learning [Omidshafiei et al, 2017]
- These minimax problems are often solved using gradient-based algorithms, especially gradient descent ascent (GDA), an algorithm that alternates between a gradient descent step for x and some number of gradient ascent steps for y.
- Most previous work [e.g., Daskalakis and Panageas, 2018, Mazumdar and Ratliff, 2018, Adolphs et al, 2018] studied a notion of local Nash equilibrium which replaces all the global minima or maxima in the definition of Nash equilibrium by their local counterparts
Highlights
- Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it
- Most of the minimax problems arising in modern machine learning applications do not have this simple convex-concave structure
- The main contribution of this paper is to propose the first proper mathematical definition of local optimality for this sequential setting—local minimax, a local surrogate for the global minimax points
- In Section 3.1, we develop a formal notion of local surrogacy for global minimax points which we refer to as local minimax points
- We consider general nonconvex-nonconcave minimax optimization problems. Since most these problems arising in modern machine learning correspond to sequential games, we propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting
- We establish a strong connection to gradient descent ascent—up to some degenerate points, local minimax points are exactly equal to the stable limit points of gradient descent ascent
Results
- The authors pointed out that while many modern applications are sequential games, the problem of finding their optima—global minimax points—is NP-hard in general.
- In Section 3.1, the authors develop a formal notion of local surrogacy for global minimax points which the authors refer to as local minimax points.
- In Section 3.3, the authors establish a close relationship between stable fixed points of GDA and local minimax points.
- To the best of the knowledge, this is the first proper mathematical definition of local optimality for the two-player sequential setting
Conclusion
- The authors consider general nonconvex-nonconcave minimax optimization problems.
- Since most these problems arising in modern machine learning correspond to sequential games, the authors propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting.
- The authors establish a strong connection to GDA—up to some degenerate points, local minimax points are exactly equal to the stable limit points of GDA
Summary
Introduction:
Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it.- In the last few years, minimax optimization has found significant applications in machine learning, in settings such as generative adversarial networks (GAN) [Goodfellow et al, 2014], adversarial training [Madry et al, 2017] and multi-agent reinforcement learning [Omidshafiei et al, 2017]
- These minimax problems are often solved using gradient-based algorithms, especially gradient descent ascent (GDA), an algorithm that alternates between a gradient descent step for x and some number of gradient ascent steps for y.
- Most previous work [e.g., Daskalakis and Panageas, 2018, Mazumdar and Ratliff, 2018, Adolphs et al, 2018] studied a notion of local Nash equilibrium which replaces all the global minima or maxima in the definition of Nash equilibrium by their local counterparts
Results:
The authors pointed out that while many modern applications are sequential games, the problem of finding their optima—global minimax points—is NP-hard in general.- In Section 3.1, the authors develop a formal notion of local surrogacy for global minimax points which the authors refer to as local minimax points.
- In Section 3.3, the authors establish a close relationship between stable fixed points of GDA and local minimax points.
- To the best of the knowledge, this is the first proper mathematical definition of local optimality for the two-player sequential setting
Conclusion:
The authors consider general nonconvex-nonconcave minimax optimization problems.- Since most these problems arising in modern machine learning correspond to sequential games, the authors propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting.
- The authors establish a strong connection to GDA—up to some degenerate points, local minimax points are exactly equal to the stable limit points of GDA
Related work
- Minimax optimization: Since the seminal paper of von Neumann [1928], notions of equilibria in games and their algorithmic computation have received wide attention. In terms of algorithmic computation, the vast majority of results focus on the convex-concave setting [Korpelevich, 1976, Nemirovski and Yudin, 1978, Nemirovski, 2004]. In the context of optimization, these problems have generally been studied in the setting of constrained convex optimization [Bertsekas, 2014]. Results beyond convex-concave setting are much more recent. Rafique et al [2018], Nouiehed et al [2019] consider nonconvex but concave minimax problems where for any x, f (x, ·) is a concave function. In this case, they propose algorithms combining approximate maximization over y and a proximal gradient method for x to show convergence to stationary points. Lin et al [2018] consider a special case of the nonconvex-nonconcave minimax problem, where the function f (·, ·) satisfies a variational inequality. In this setting, they consider a proximal algorithm that requires the solving of certain strong variational inequality problems in each step and show its convergence to stationary points. Hsieh et al [2018] propose proximal methods that asymptotically converge to a mixed Nash equilibrium; i.e., a distribution rather than a point.
Reference
- Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, and Thomas Hofmann. Local saddle point optimization: A curvature exploitation approach. arXiv preprint arXiv:1805.05751, 2018.
- Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprint arXiv:1703.00573, 2017.
- Dimitri Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 2014.
- Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Advances in Neural Information Processing Systems, pages 2757–2765, 2016.
- Ronald Bruck Jr. On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space. Journal of Mathematical Analysis and Applications, 61(1): 159–164, 1977.
- Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
- Ashish Cherukuri, Bahman Gharesifard, and Jorge Cortes. Saddle-point dynamics: conditions for asymptotic stability of saddle points. SIAM Journal on Control and Optimization, 55(1):486–511, 2017.
- Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization. In Advances in Neural Information Processing Systems, pages 9256–9266, 2018.
- Damek Davis and Dmitriy Drusvyatskiy. Stochastic subgradient method converges at the rate o(k 4 ) on weakly convex functions. arXiv preprint arXiv:1802.02988, 2018.
- Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. In International Conference on Machine Learning, pages 1233–1242, 2017.
- Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Gabriel Huang, Remi Lepriol, Simon Lacoste-Julien, and Ioannis Mitliagkas. Negative momentum for improved game dynamics. arXiv preprint arXiv:1807.04740, 2018.
- Irving L Glicksberg. A further generalization of the Kakutani fixed point theorem, with application to Nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
- Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4): 157–325, 2016.
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
- Ya-Ping Hsieh, Chen Liu, and Volkan Cevher. Finding mixed Nash equilibria of generative adversarial networks. arXiv preprint arXiv:1811.02002, 2018.
- GM Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12: 747–756, 1976.
- Qihang Lin, Mingrui Liu, Hassan Rafique, and Tianbao Yang. Solving weakly-convex-weakly-concave saddle-point problems as weakly-monotone variational inequality. arXiv preprint arXiv:1810.10207, 2018.
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Eric Mazumdar and Lillian J Ratliff. On the convergence of gradient-based learning in continuous games. arXiv preprint arXiv:1804.05464, 2018.
- Eric V Mazumdar, Michael I Jordan, and S Shankar Sastry. On finding local Nash equilibria (and only local Nash equilibria) in zero-sum games. arXiv preprint arXiv:1901.00838, 2019.
- Oskar Morgenstern and John Von Neumann. Theory of Games and Economic Behavior. Princeton University Press, 1953.
- Roger B Myerson. Game Theory. Harvard University Press, 2013.
- Vaishnavh Nagarajan and J Zico Kolter. Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems, pages 5585–5595, 2017.
- Arkadi Nemirovski. Efficient methods for solving variational inequalities. Ekonomika i Matem. Metody, 17: 344–359, 1981.
- Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- Arkadi Nemirovski and D. Yudin. Cesari convergence of the gradient method for approximation saddle points of convex-concave functions. Doklady AN SSSRv, 239:1056–1059, 1978.
- Maher Nouiehed, Maziar Sanjabi, Jason D Lee, and Meisam Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. arXiv preprint arXiv:1902.08297, 2019.
- Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. arXiv preprint arXiv:1703.06182, 2017.
- Hassan Rafique, Mingrui Liu, Qihang Lin, and Tianbao Yang. Non-convex min-max optimization: Provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060, 2018.
- Ralph Tyrell Rockafellar. Convex Analysis. Princeton University Press, 2015. Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958. J von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928. Mishael Zedek. Continuity and location of zeros of linear combinations of polynomials. Proceedings of the
- American Mathematical Society, 16(1):78–84, 1965.
- Proposition 29 ([Glicksberg, 1952]). Assume that the function f: X × Y → R is continuous and that X ⊂ Rd1, Y ⊂ Rd2 are compact. Then min max E(μ,ν)f (x, y) = max min E(μ,ν)f (x, y).
- Lemma 34 ([Rockafellar, 2015]). Assume the function φ is -weakly convex. Let λ < 1/, and denote x = argminx φ(x ) + (1/2λ) x − x 2. Then ∇φλ(x) ≤ implies: x − x = λ, and min g ≤, g∈∂φ(x)
- The proof of Theorem 35 is similar to the convergence analysis for nonsmooth weakly-convex functions [Davis and Drusvyatskiy, 2018], except here the max-oracle has error. Theorem 35 claims, other than an additive error 4 as a√result of the oracle solving the maximum approximately, that the remaining term decreases at a rate of 1/ T.
- 2. Clearly, the gradient is equal to (0.2y, 0.2x + sin(y)). And, for any fixed x, there are only two maxima y (x) satisfying 0.2x + sin(y ) = 0 where y1(x) ∈ (−3π/2, −π/2) and y2(x) ∈ (π/2, 3π/2). On the other hand, f (x, y1(x)) is monotonically decreasing with respect to x, while f (x, y2(x)) is monotonically increasing, with f (0, y1(0)) = f (0, y2(0)) by symmetry. It is not hard to check y1(0) = −π and y2(0) = π. Therefore, (0, −π) and (0, π) are two global solutions of the minimax problem. However, the gradients at both points are not 0, thus they are not stationary points. By Proposition 18 they are also not local minimax points.
Tags
Comments