Convergence of Multi-Scale Reinforcement Q-Learning Algorithms for Mean Field Game and Control Problems
arxiv(2023)
摘要
We establish the convergence of the unified two-timescale Reinforcement
Learning (RL) algorithm presented by Angiuli et al. This algorithm provides
solutions to Mean Field Game (MFG) or Mean Field Control (MFC) problems
depending on the ratio of two learning rates, one for the value function and
the other for the mean field term. We focus a setting with finite state and
action spaces, discrete time and infinite horizon. The proof of convergence
relies on a generalization of the two-timescale approach of Borkar. The
accuracy of approximation to the true solutions depends on the smoothing of the
policies. We then provide an numerical example illustrating the convergence.
Last, we generalize our convergence result to a three-timescale RL algorithm
introduced by Angiuli et al. to solve mixed Mean Field Control Games (MFCGs).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要