What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?

ICML, pp. 4880-4889, 2019.

Cited by: 17|Bibtex|Views405|
EI
Keywords:
minimax optimizationgradient descent ascentadversarial traininglimit pointminimax problemMore(10+)
Weibo:
The main contribution of this paper is to propose a proper mathematical definition of local optimality for this sequential setting---local minimax, as well as to present its properties and existence results

Abstract:

Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks (GANs), adversarial training and multi-agent reinforcement learning. As most of these applications involve continuous nonconvex-nonconcave formulations, a very basic question arises---``what is a proper def...More

Code:

Data:

0
Introduction
  • Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it.
  • In the last few years, minimax optimization has found significant applications in machine learning, in settings such as generative adversarial networks (GAN) [Goodfellow et al, 2014], adversarial training [Madry et al, 2017] and multi-agent reinforcement learning [Omidshafiei et al, 2017]
  • These minimax problems are often solved using gradient-based algorithms, especially gradient descent ascent (GDA), an algorithm that alternates between a gradient descent step for x and some number of gradient ascent steps for y.
  • Most previous work [e.g., Daskalakis and Panageas, 2018, Mazumdar and Ratliff, 2018, Adolphs et al, 2018] studied a notion of local Nash equilibrium which replaces all the global minima or maxima in the definition of Nash equilibrium by their local counterparts
Highlights
  • Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it
  • Most of the minimax problems arising in modern machine learning applications do not have this simple convex-concave structure
  • The main contribution of this paper is to propose the first proper mathematical definition of local optimality for this sequential setting—local minimax, a local surrogate for the global minimax points
  • In Section 3.1, we develop a formal notion of local surrogacy for global minimax points which we refer to as local minimax points
  • We consider general nonconvex-nonconcave minimax optimization problems. Since most these problems arising in modern machine learning correspond to sequential games, we propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting
  • We establish a strong connection to gradient descent ascent—up to some degenerate points, local minimax points are exactly equal to the stable limit points of gradient descent ascent
Results
  • The authors pointed out that while many modern applications are sequential games, the problem of finding their optima—global minimax points—is NP-hard in general.
  • In Section 3.1, the authors develop a formal notion of local surrogacy for global minimax points which the authors refer to as local minimax points.
  • In Section 3.3, the authors establish a close relationship between stable fixed points of GDA and local minimax points.
  • To the best of the knowledge, this is the first proper mathematical definition of local optimality for the two-player sequential setting
Conclusion
  • The authors consider general nonconvex-nonconcave minimax optimization problems.
  • Since most these problems arising in modern machine learning correspond to sequential games, the authors propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting.
  • The authors establish a strong connection to GDA—up to some degenerate points, local minimax points are exactly equal to the stable limit points of GDA
Summary
  • Introduction:

    Minimax optimization refers to problems of two agents—one agent tries to minimize the payoff function f : X × Y → R while the other agent tries to maximize it.
  • In the last few years, minimax optimization has found significant applications in machine learning, in settings such as generative adversarial networks (GAN) [Goodfellow et al, 2014], adversarial training [Madry et al, 2017] and multi-agent reinforcement learning [Omidshafiei et al, 2017]
  • These minimax problems are often solved using gradient-based algorithms, especially gradient descent ascent (GDA), an algorithm that alternates between a gradient descent step for x and some number of gradient ascent steps for y.
  • Most previous work [e.g., Daskalakis and Panageas, 2018, Mazumdar and Ratliff, 2018, Adolphs et al, 2018] studied a notion of local Nash equilibrium which replaces all the global minima or maxima in the definition of Nash equilibrium by their local counterparts
  • Results:

    The authors pointed out that while many modern applications are sequential games, the problem of finding their optima—global minimax points—is NP-hard in general.
  • In Section 3.1, the authors develop a formal notion of local surrogacy for global minimax points which the authors refer to as local minimax points.
  • In Section 3.3, the authors establish a close relationship between stable fixed points of GDA and local minimax points.
  • To the best of the knowledge, this is the first proper mathematical definition of local optimality for the two-player sequential setting
  • Conclusion:

    The authors consider general nonconvex-nonconcave minimax optimization problems.
  • Since most these problems arising in modern machine learning correspond to sequential games, the authors propose a new notion of local optimality—local minimax—the first proper mathematical definition of local optimality for the two-player sequential setting.
  • The authors establish a strong connection to GDA—up to some degenerate points, local minimax points are exactly equal to the stable limit points of GDA
Related work
  • Minimax optimization: Since the seminal paper of von Neumann [1928], notions of equilibria in games and their algorithmic computation have received wide attention. In terms of algorithmic computation, the vast majority of results focus on the convex-concave setting [Korpelevich, 1976, Nemirovski and Yudin, 1978, Nemirovski, 2004]. In the context of optimization, these problems have generally been studied in the setting of constrained convex optimization [Bertsekas, 2014]. Results beyond convex-concave setting are much more recent. Rafique et al [2018], Nouiehed et al [2019] consider nonconvex but concave minimax problems where for any x, f (x, ·) is a concave function. In this case, they propose algorithms combining approximate maximization over y and a proximal gradient method for x to show convergence to stationary points. Lin et al [2018] consider a special case of the nonconvex-nonconcave minimax problem, where the function f (·, ·) satisfies a variational inequality. In this setting, they consider a proximal algorithm that requires the solving of certain strong variational inequality problems in each step and show its convergence to stationary points. Hsieh et al [2018] propose proximal methods that asymptotically converge to a mixed Nash equilibrium; i.e., a distribution rather than a point.
Reference
  • Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, and Thomas Hofmann. Local saddle point optimization: A curvature exploitation approach. arXiv preprint arXiv:1805.05751, 2018.
    Findings
  • Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprint arXiv:1703.00573, 2017.
    Findings
  • Dimitri Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 2014.
    Google ScholarFindings
  • Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Advances in Neural Information Processing Systems, pages 2757–2765, 2016.
    Google ScholarLocate open access versionFindings
  • Ronald Bruck Jr. On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space. Journal of Mathematical Analysis and Applications, 61(1): 159–164, 1977.
    Google ScholarLocate open access versionFindings
  • Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
    Google ScholarLocate open access versionFindings
  • Ashish Cherukuri, Bahman Gharesifard, and Jorge Cortes. Saddle-point dynamics: conditions for asymptotic stability of saddle points. SIAM Journal on Control and Optimization, 55(1):486–511, 2017.
    Google ScholarLocate open access versionFindings
  • Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization. In Advances in Neural Information Processing Systems, pages 9256–9266, 2018.
    Google ScholarLocate open access versionFindings
  • Damek Davis and Dmitriy Drusvyatskiy. Stochastic subgradient method converges at the rate o(k 4 ) on weakly convex functions. arXiv preprint arXiv:1802.02988, 2018.
    Findings
  • Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. In International Conference on Machine Learning, pages 1233–1242, 2017.
    Google ScholarLocate open access versionFindings
  • Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Gabriel Huang, Remi Lepriol, Simon Lacoste-Julien, and Ioannis Mitliagkas. Negative momentum for improved game dynamics. arXiv preprint arXiv:1807.04740, 2018.
    Findings
  • Irving L Glicksberg. A further generalization of the Kakutani fixed point theorem, with application to Nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4): 157–325, 2016.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
    Google ScholarLocate open access versionFindings
  • Ya-Ping Hsieh, Chen Liu, and Volkan Cevher. Finding mixed Nash equilibria of generative adversarial networks. arXiv preprint arXiv:1811.02002, 2018.
    Findings
  • GM Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12: 747–756, 1976.
    Google ScholarLocate open access versionFindings
  • Qihang Lin, Mingrui Liu, Hassan Rafique, and Tianbao Yang. Solving weakly-convex-weakly-concave saddle-point problems as weakly-monotone variational inequality. arXiv preprint arXiv:1810.10207, 2018.
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • Eric Mazumdar and Lillian J Ratliff. On the convergence of gradient-based learning in continuous games. arXiv preprint arXiv:1804.05464, 2018.
    Findings
  • Eric V Mazumdar, Michael I Jordan, and S Shankar Sastry. On finding local Nash equilibria (and only local Nash equilibria) in zero-sum games. arXiv preprint arXiv:1901.00838, 2019.
    Findings
  • Oskar Morgenstern and John Von Neumann. Theory of Games and Economic Behavior. Princeton University Press, 1953.
    Google ScholarFindings
  • Roger B Myerson. Game Theory. Harvard University Press, 2013.
    Google ScholarFindings
  • Vaishnavh Nagarajan and J Zico Kolter. Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems, pages 5585–5595, 2017.
    Google ScholarLocate open access versionFindings
  • Arkadi Nemirovski. Efficient methods for solving variational inequalities. Ekonomika i Matem. Metody, 17: 344–359, 1981.
    Google ScholarLocate open access versionFindings
  • Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
    Google ScholarLocate open access versionFindings
  • Arkadi Nemirovski and D. Yudin. Cesari convergence of the gradient method for approximation saddle points of convex-concave functions. Doklady AN SSSRv, 239:1056–1059, 1978.
    Google ScholarLocate open access versionFindings
  • Maher Nouiehed, Maziar Sanjabi, Jason D Lee, and Meisam Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. arXiv preprint arXiv:1902.08297, 2019.
    Findings
  • Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. arXiv preprint arXiv:1703.06182, 2017.
    Findings
  • Hassan Rafique, Mingrui Liu, Qihang Lin, and Tianbao Yang. Non-convex min-max optimization: Provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060, 2018.
    Findings
  • Ralph Tyrell Rockafellar. Convex Analysis. Princeton University Press, 2015. Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958. J von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928. Mishael Zedek. Continuity and location of zeros of linear combinations of polynomials. Proceedings of the
    Google ScholarLocate open access versionFindings
  • American Mathematical Society, 16(1):78–84, 1965.
    Google ScholarFindings
  • Proposition 29 ([Glicksberg, 1952]). Assume that the function f: X × Y → R is continuous and that X ⊂ Rd1, Y ⊂ Rd2 are compact. Then min max E(μ,ν)f (x, y) = max min E(μ,ν)f (x, y).
    Google ScholarFindings
  • Lemma 34 ([Rockafellar, 2015]). Assume the function φ is -weakly convex. Let λ < 1/, and denote x = argminx φ(x ) + (1/2λ) x − x 2. Then ∇φλ(x) ≤ implies: x − x = λ, and min g ≤, g∈∂φ(x)
    Google ScholarLocate open access versionFindings
  • The proof of Theorem 35 is similar to the convergence analysis for nonsmooth weakly-convex functions [Davis and Drusvyatskiy, 2018], except here the max-oracle has error. Theorem 35 claims, other than an additive error 4 as a√result of the oracle solving the maximum approximately, that the remaining term decreases at a rate of 1/ T.
    Google ScholarFindings
  • 2. Clearly, the gradient is equal to (0.2y, 0.2x + sin(y)). And, for any fixed x, there are only two maxima y (x) satisfying 0.2x + sin(y ) = 0 where y1(x) ∈ (−3π/2, −π/2) and y2(x) ∈ (π/2, 3π/2). On the other hand, f (x, y1(x)) is monotonically decreasing with respect to x, while f (x, y2(x)) is monotonically increasing, with f (0, y1(0)) = f (0, y2(0)) by symmetry. It is not hard to check y1(0) = −π and y2(0) = π. Therefore, (0, −π) and (0, π) are two global solutions of the minimax problem. However, the gradients at both points are not 0, thus they are not stationary points. By Proposition 18 they are also not local minimax points.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments