RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning
arxiv(2024)
摘要
Effective action abstraction is crucial in tackling challenges associated
with large action spaces in Imperfect Information Extensive-Form Games
(IIEFGs). However, due to the vast state space and computational complexity in
IIEFGs, existing methods often rely on fixed abstractions, resulting in
sub-optimal performance. In response, we introduce RL-CFR, a novel
reinforcement learning (RL) approach for dynamic action abstraction. RL-CFR
builds upon our innovative Markov Decision Process (MDP) formulation, with
states corresponding to public information and actions represented as feature
vectors indicating specific action abstractions. The reward is defined as the
expected payoff difference between the selected and default action
abstractions. RL-CFR constructs a game tree with RL-guided action abstractions
and utilizes counterfactual regret minimization (CFR) for strategy derivation.
Impressively, it can be trained from scratch, achieving higher expected payoff
without increased CFR solving time. In experiments on Heads-up No-limit Texas
Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating
significant win-rate margins of 64± 11 and 84± 17 mbb/hand,
respectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要