A Theory of Mind Approach as Test-Time Mitigation Against Emergent Adversarial Communication

arxiv(2023)

引用 0|浏览2
暂无评分
摘要
Multi-Agent Systems (MAS) is the study of multi-agent interactions in a shared environment. Communication for cooperation is a fundamental construct for sharing information in partially observable environments. Cooperative Multi-Agent Reinforcement Learning (CoMARL) is a learning framework where we learn agent policies either with cooperative mechanisms or policies that exhibit cooperative behavior. Explicitly, there are works on learning to communicate messages from CoMARL agents; however, non-cooperative agents, when capable of access a cooperative team's communication channel, have been shown to learn adversarial communication messages, sabotaging the cooperative team's performance particularly when objectives depend on finite resources. To address this issue, we propose a technique which leverages local formulations of Theory-of-Mind (ToM) to distinguish exhibited cooperative behavior from non-cooperative behavior before accepting messages from any agent. We demonstrate the efficacy and feasibility of the proposed technique in empirical evaluations in a centralized training, decentralized execution (CTDE) CoMARL benchmark. Furthermore, while we propose our explicit ToM defense for test-time, we emphasize that ToM is a construct for designing a cognitive defense rather than be the objective of the defense.
更多
查看译文
关键词
adversarial communication,mind approach,test-time
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要