Trajectory Diversity for Zero-Shot Coordination

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139(2021)

引用 62|浏览41
暂无评分
摘要
We study the problem of zero-shot coordination (ZSC), where agents must independently produce strategies for a collaborative game that are compatible with novel partners not seen during training. In particular, our first contribution is to consider the need for diversity in generating such agents. Because self-play agents control their own trajectory distribution during training, their policy only performs well on this exact distribution. As a result, they achieve low scores in ZSC, since playing with another agent is likely to put them in situations they have not encountered during training. To address this issue, we train a common best response (BR) to a population of agents, which we regulate to be as diverse as possible. For that purpose, we introduce Trajectory Diversity (TrajeDi) - a differentiable objective for generating diverse reinforcement learning (RL) policies. We present TrajeDi as a generalization of the Jensen-Shannon divergence (JSD) between policies and motivate it experimentally in a simple matrix game, where it allows to find the unique ZSC-optimal solution.
更多
查看译文
关键词
coordination,diversity,zero-shot
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要