Fast Convergence of Regularized Learning in Games

Annual Conference on Neural Information Processing Systems, (2015): 2989-2997

Cited by: 128|Views433
EI

Abstract

We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at O(T-3/4), while the sum ...More

Code:

Data:

0
Introduction
  • What happens when players in a game interact with one another, all of them acting independently and selfishly to maximize their own utilities? If they are smart, the authors intuitively expect their utilities — both individually and as a group — to grow, perhaps even to approach the best possible.
  • When played against one another, it can be shown that the sum of utilities approaches an approximate optimum [2, 18], and the player strategies converge to an equilibrium under appropriate conditions [6, 1, 8], at rates governed by the regret bounds.
  • For two-player zero-sum games, they developed a decentralized variant of Nesterov’s accelerated saddle point algorithm [15] and showed that each player’s average regret converges at the remarkable rate of
Highlights
  • What happens when players in a game interact with one another, all of them acting independently and selfishly to maximize their own utilities? If they are smart, we intuitively expect their utilities — both individually and as a group — to grow, perhaps even to approach the best possible
  • When played against one another, it can be shown that the sum of utilities approaches an approximate optimum [2, 18], and the player strategies converge to an equilibrium under appropriate conditions [6, 1, 8], at rates governed by the regret bounds
  • We present our main theoretical results characterizing a class of no-regret dynamics which lead to faster convergence in smooth games
  • We have shown simple dynamics with rapid convergence properties in favorable environments when each player in the game uses an algorithm with the Regret bounded by Variation in Utilities (RVU) property
  • There are a number of interesting questions and directions for future research which are suggested by our results, including the following: Convergence rates for vanilla Hedge: The fast rates of our paper do not apply to algorithms such as Hedge without modification
  • Convergence of players’ strategies: The optimistic follow the regularized leader (OFTRL) algorithm often produces much more stable trajectories empirically, as the players converge to an equilibrium, as opposed to say Hedge
Results
  • The authors simulate a 4-bidder simultaneous auction game, and compare the optimistic algorithms against Hedge [7] in terms of utilities, regrets and convergence to equilibria.
  • For some natural classes of no-regret algorithms the average welfare converges at the much faster rate of log(d)/T
  • The authors' main result in this sect=io1n bounds this sum when each player uses dynamics with the RVU property.
  • The theorem does not require that all players use the same no-regret algorithm unlike previous results [4, 17], as long as each player’s algorithm satisfies the RVU property with a common bound on the constants.
  • Rakhlin and Sridharan [16] analyze an FTRL variant, but require a self-concordant barrier for the constraint set as opposed to an arbitrary strongly convex regularizer, and their bound is missing the crucial negative terms of the RVU property which are essential for obtaining Theorem 4.
  • The authors have shown simple dynamics with rapid convergence properties in favorable environments when each player in the game uses an algorithm with the RVU property.
  • Rakhlin and Sridharan [17] address this concern by modifying the OMD algorithm with additional smoothing and adapptive step-sizes so as to preserve the fast rates in the favorable case while still guaranteeing regret for each player, no matter how the opponents play.
Conclusion
  • The authors present a generic way of transforming an algorithm which satisfies the RVU property so that it pretains the fast convergence in favorable settings, but always guarantees a worst-case regret of
  • The key reason is the following: the proof that no-regret dynamics are approximately efficient (Proposition 2) only relies on the fact that each player does not have regret against the strategy s⇤i used in the definition of a smooth game.
  • There are a number of interesting questions and directions for future research which are suggested by the results, including the following: Convergence rates for vanilla Hedge: The fast rates of the paper do not apply to algorithms such as Hedge without modification.
Reference
  • A. Blum and Y. Mansour. Learning, regret minimization, and equilibria. In Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, editors, Algorithmic Game Theory, chapter 4, pages 4–30. Cambridge University Press, 2007.
    Google ScholarLocate open access versionFindings
  • Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimization and the price of total anarchy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, pages 373–382, New York, NY, USA, 2008. ACM.
    Google ScholarLocate open access versionFindings
  • Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.
    Google ScholarFindings
  • Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92:327–348, 2014.
    Google ScholarLocate open access versionFindings
  • Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Working Paper 11765, National Bureau of Economic Research, November 2005.
    Google ScholarLocate open access versionFindings
  • Dean P. Foster and Rakesh V. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21(12):40 – 55, 1997.
    Google ScholarLocate open access versionFindings
  • Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139, 1997.
    Google ScholarLocate open access versionFindings
  • Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.
    Google ScholarLocate open access versionFindings
  • Drew Fudenberg and Alexander Peysakhovich. Recency, records and recaps: Learning and nonequilibrium behavior in a simple decision problem. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, pages 971–986, New York, NY, USA, 2014. ACM.
    Google ScholarLocate open access versionFindings
  • Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
    Google ScholarLocate open access versionFindings
  • Wassily Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. Ann. Math. Statist., 29(3):700–718, 1958.
    Google ScholarLocate open access versionFindings
  • Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 – 307, 2005. Learning Theory 2003 Learning Theory 2003.
    Google ScholarLocate open access versionFindings
  • Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm. Information and computation, 108(2):212–261, 1994.
    Google ScholarLocate open access versionFindings
  • AS Nemirovsky and DB Yudin. Problem complexity and method efficiency in optimization. 1983.
    Google ScholarFindings
  • Yu. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–
    Google ScholarLocate open access versionFindings
  • Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In COLT 2013, pages 993–1019, 2013.
    Google ScholarLocate open access versionFindings
  • Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066–3074, 2013.
    Google ScholarLocate open access versionFindings
  • T. Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st annual ACM symposium on Theory of computing, pages 513–522, New York, NY, USA, 2009. ACM.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107–194, February 2012.
    Google ScholarLocate open access versionFindings
  • Vasilis Syrgkanis and Eva Tardos. Composable and efficient mechanisms. In Proceedings of the Fortyfifth Annual ACM Symposium on Theory of Computing, STOC ’13, pages 211–220, New York, NY, USA, 2013. ACM.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科