Convergence of Multi-Scale Reinforcement Q-Learning Algorithms for Mean Field Game and Control Problems

arxiv(2023)

引用 0|浏览0
暂无评分
摘要
We establish the convergence of the unified two-timescale Reinforcement Learning (RL) algorithm presented by Angiuli et al. This algorithm provides solutions to Mean Field Game (MFG) or Mean Field Control (MFC) problems depending on the ratio of two learning rates, one for the value function and the other for the mean field term. We focus a setting with finite state and action spaces, discrete time and infinite horizon. The proof of convergence relies on a generalization of the two-timescale approach of Borkar. The accuracy of approximation to the true solutions depends on the smoothing of the policies. We then provide an numerical example illustrating the convergence. Last, we generalize our convergence result to a three-timescale RL algorithm introduced by Angiuli et al. to solve mixed Mean Field Control Games (MFCGs).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要