Generating and Adapting to Diverse Ad-Hoc Partners in Hanabi

Rodrigo Canaan,Xianbo Gao,Julian Togelius,Andy Nealen,Stefan Menzel

IEEE transactions on games（2022）

引用 0|浏览8

暂无评分

摘要

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect. In this paper, we focus on ad-hoc settings with no previous coordination between partners. We introduce a ‘`Bayesian Meta-Agent’' that maintains a belief distribution over hypotheses of partner policies. The policies that serve as initial hypotheses are generated using MAP-Elites, to ensure behavioral diversity. We evaluate an ‘`Adaptive’' version of the agent, which selects a response policy based on the updated belief distribution and a ‘`Generalist’' version, which selects a response based on the uniform prior. In short episodes of 10 games with a consistent partner, the ‘`Adaptive’' version outperforms the ‘`Generalist’' when the training and evaluation populations are the same. This presents a first step towards an agent that can model its partner and adapt within a time frame that is compatible with human interaction.

查看译文

关键词

Learning (artificial intelligence), Naive Bayes methods, Computational and artificial intelligence, Evolutionary computation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要