Fast Convergence of Regularized Learning in Games
Annual Conference on Neural Information Processing Systems, (2015): 2989-2997
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at O(T-3/4), while the sum ...More
PPT (Upload PPT)
- What happens when players in a game interact with one another, all of them acting independently and selfishly to maximize their own utilities? If they are smart, the authors intuitively expect their utilities — both individually and as a group — to grow, perhaps even to approach the best possible.
- When played against one another, it can be shown that the sum of utilities approaches an approximate optimum [2, 18], and the player strategies converge to an equilibrium under appropriate conditions [6, 1, 8], at rates governed by the regret bounds.
- For two-player zero-sum games, they developed a decentralized variant of Nesterov’s accelerated saddle point algorithm  and showed that each player’s average regret converges at the remarkable rate of
- What happens when players in a game interact with one another, all of them acting independently and selfishly to maximize their own utilities? If they are smart, we intuitively expect their utilities — both individually and as a group — to grow, perhaps even to approach the best possible
- When played against one another, it can be shown that the sum of utilities approaches an approximate optimum [2, 18], and the player strategies converge to an equilibrium under appropriate conditions [6, 1, 8], at rates governed by the regret bounds
- We present our main theoretical results characterizing a class of no-regret dynamics which lead to faster convergence in smooth games
- We have shown simple dynamics with rapid convergence properties in favorable environments when each player in the game uses an algorithm with the Regret bounded by Variation in Utilities (RVU) property
- There are a number of interesting questions and directions for future research which are suggested by our results, including the following: Convergence rates for vanilla Hedge: The fast rates of our paper do not apply to algorithms such as Hedge without modification
- Convergence of players’ strategies: The optimistic follow the regularized leader (OFTRL) algorithm often produces much more stable trajectories empirically, as the players converge to an equilibrium, as opposed to say Hedge
- The authors simulate a 4-bidder simultaneous auction game, and compare the optimistic algorithms against Hedge  in terms of utilities, regrets and convergence to equilibria.
- For some natural classes of no-regret algorithms the average welfare converges at the much faster rate of log(d)/T
- The authors' main result in this sect=io1n bounds this sum when each player uses dynamics with the RVU property.
- The theorem does not require that all players use the same no-regret algorithm unlike previous results [4, 17], as long as each player’s algorithm satisfies the RVU property with a common bound on the constants.
- Rakhlin and Sridharan  analyze an FTRL variant, but require a self-concordant barrier for the constraint set as opposed to an arbitrary strongly convex regularizer, and their bound is missing the crucial negative terms of the RVU property which are essential for obtaining Theorem 4.
- The authors have shown simple dynamics with rapid convergence properties in favorable environments when each player in the game uses an algorithm with the RVU property.
- Rakhlin and Sridharan  address this concern by modifying the OMD algorithm with additional smoothing and adapptive step-sizes so as to preserve the fast rates in the favorable case while still guaranteeing regret for each player, no matter how the opponents play.
- The authors present a generic way of transforming an algorithm which satisfies the RVU property so that it pretains the fast convergence in favorable settings, but always guarantees a worst-case regret of
- The key reason is the following: the proof that no-regret dynamics are approximately efficient (Proposition 2) only relies on the fact that each player does not have regret against the strategy s⇤i used in the definition of a smooth game.
- There are a number of interesting questions and directions for future research which are suggested by the results, including the following: Convergence rates for vanilla Hedge: The fast rates of the paper do not apply to algorithms such as Hedge without modification.
- A. Blum and Y. Mansour. Learning, regret minimization, and equilibria. In Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, editors, Algorithmic Game Theory, chapter 4, pages 4–30. Cambridge University Press, 2007.
- Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimization and the price of total anarchy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, pages 373–382, New York, NY, USA, 2008. ACM.
- Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.
- Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92:327–348, 2014.
- Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Working Paper 11765, National Bureau of Economic Research, November 2005.
- Dean P. Foster and Rakesh V. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21(12):40 – 55, 1997.
- Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139, 1997.
- Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.
- Drew Fudenberg and Alexander Peysakhovich. Recency, records and recaps: Learning and nonequilibrium behavior in a simple decision problem. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, pages 971–986, New York, NY, USA, 2014. ACM.
- Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
- Wassily Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. Ann. Math. Statist., 29(3):700–718, 1958.
- Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 – 307, 2005. Learning Theory 2003 Learning Theory 2003.
- Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm. Information and computation, 108(2):212–261, 1994.
- AS Nemirovsky and DB Yudin. Problem complexity and method efficiency in optimization. 1983.
- Yu. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–
- Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In COLT 2013, pages 993–1019, 2013.
- Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066–3074, 2013.
- T. Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st annual ACM symposium on Theory of computing, pages 513–522, New York, NY, USA, 2009. ACM.
- Shai Shalev-Shwartz. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107–194, February 2012.
- Vasilis Syrgkanis and Eva Tardos. Composable and efficient mechanisms. In Proceedings of the Fortyfifth Annual ACM Symposium on Theory of Computing, STOC ’13, pages 211–220, New York, NY, USA, 2013. ACM.