Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
arxiv(2024)
摘要
We address in this paper Reinforcement Learning (RL) among agents that are
grouped into teams such that there is cooperation within each team but
general-sum (non-zero sum) competition across different teams. To develop an RL
method that provably achieves a Nash equilibrium, we focus on a
linear-quadratic structure. Moreover, to tackle the non-stationarity induced by
multi-agent interactions in the finite population setting, we consider the case
where the number of agents within each team is infinite, i.e., the mean-field
setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTGs). We
characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard
invertibility condition. This MFTG NE is then shown to be 𝒪(1/M)-NE
for the finite population game where M is a lower bound on the number of
agents in each team. These structural results motivate an algorithm called
Multi-player Receding-horizon Natural Policy Gradient (MRPG), where each team
minimizes its cumulative cost independently in a receding-horizon manner.
Despite the non-convexity of the problem, we establish that the resulting
algorithm converges to a global NE through a novel problem decomposition into
sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs
(HJI) equations, in which independent natural policy gradient is shown to
exhibit linear convergence under time-independent diagonal dominance.
Experiments illuminate the merits of this approach in practice.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要