Bayes-ToMoP: A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents.

Tianpei Yang,Zhaopeng Meng,Jianye Hao,Chongjie Zhang,Yan Zheng

arXiv: Multiagent Systems（2018）

引用 6|浏览133

暂无评分

摘要

Multiagent algorithms often aim to accurately predict the behaviors of other agents and find a best response during interactions accordingly. Previous works usually assume an opponent uses a stationary strategy or randomly switches among several stationary ones. However, in practice, an opponent may exhibit more sophisticated behaviors by adopting more advanced strategies, e.g., using a Bayesian reasoning strategy. This paper presents a novel algorithm called Bayes-ToMoP which can efficiently detect and handle opponents using either stationary or higher-level reasoning strategies. Bayes-ToMoP also supports the detection of previously unseen policies and learning a best-response policy accordingly. We provide theoretical guarantees of the optimal detection of the opponentu0027s strategies. Deep Bayes-ToMoP is proposed by extending Bayes-ToMoP with DRL techniques. Experimental results show both Bayes-ToMoP and deep Bayes-ToMoP outperform the state-of-the-art approaches when faced with different types of opponents in two-agent competitive games.

查看译文

关键词

Multiagent learning,policy reuse,theory of mind

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要