Adversarial Diversity in Hanabi

ICLR 2023(2023)

引用 7|浏览44
暂无评分
摘要
Many Dec-POMDPs admit a qualitatively diverse set of ``reasonable'' joint policies. Diversity literature is concerned with generating these joint policies. Unfortunately, existing methods fail to produce teams of agents that are simultaneously diverse, high performing and, ``reasonable''. In this work, we propose a novel approach to diverse policy generation for turn-based Dec-POMDPs with public actions, which relies on off-belief learning to encourage reasonableness and skill, and on ``repulsive'' fictitious transitions to encourage diversity. We use this approach to generate new agents with distinct but ``reasonable'' play styles for the card game Hanabi, as indicated by their non-sabotaging behaviour and the graceful degradation of their performance with ad-hoc partners. We open-source our agents so that they may be used as starting points for a test bed for future research on (ad-hoc) coordination.
更多
查看译文
关键词
coordination,diversity,multi-agent reinforcement learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要