Offline Fictitious Self-Play for Competitive Games
arxiv(2024)
摘要
Offline Reinforcement Learning (RL) has received significant interest due to
its ability to improve policies in previously collected datasets without online
interactions. Despite its success in the single-agent setting, offline
multi-agent RL remains a challenge, especially in competitive games. Firstly,
unaware of the game structure, it is impossible to interact with the opponents
and conduct a major learning paradigm, self-play, for competitive games.
Secondly, real-world datasets cannot cover all the state and action space in
the game, resulting in barriers to identifying Nash equilibrium (NE). To
address these issues, this paper introduces Off-FSP, the first practical
model-free offline RL algorithm for competitive games. We start by simulating
interactions with various opponents by adjusting the weights of the fixed
dataset with importance sampling. This technique allows us to learn best
responses to different opponents and employ the Offline Self-Play learning
framework. In this framework, we further implement Fictitious Self-Play (FSP)
to approximate NE. In partially covered real-world datasets, our methods show
the potential to approach NE by incorporating any single-agent offline RL
method. Experimental results in Leduc Hold'em Poker show that our method
significantly improves performances compared with state-of-the-art baselines.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要