DVF: Multi-agent Q-learning with difference value factorization

Knowledge-Based Systems(2024)

引用 0|浏览0
暂无评分
摘要
In cooperative multi-agent games, agents are required to learn effective cooperative behaviors within complex action spaces. An effective approach is to utilize the Individual-Global-Max (IGM) principle to decompose the global reward signal into individual contributions from each agent. However, existing methods confront two significant challenges: ensuring the accuracy of the value function decomposition process and guaranteeing monotonic improvement in joint policies. These challenges become exacerbated in scenarios involving non-monotonic matrices. To address these challenges, we introduce a novel and flexible value factorization method called Difference Value Factorization (DVF). The key idea of our method is to transform the IGM principle into a new form DVF-IGM, which addresses non-monotonic constraints by ensuring the consistency of the IGM process between the joint difference value and the complex non-linear sum of independent difference values. A centralized evaluator is employed to estimate global Q-values, which not only enhances expressiveness but also constructs difference values for updating individual value functions. We demonstrate that DVF-IGM is an equivalent transformation of IGM and that DVF has the monotonic improvement property. Empirically, our method has been shown to maintain and recover the optimal policy in non-monotonic matrix games and achieve state-of-the-art performance in cooperative tasks within the StarCraft Multi-Agent Challenge (SMAC). • We transforms the IGM principle into a new form DVF-IGM. • We employ a centralized evaluator to enhance expressiveness. • We can maintain IGM consistency in non-monotonic payoff settings.
更多
查看译文
关键词
Multi-agent reinforcement learning,Value factorization,Individual-global-max,Reinforcement learning,Multi-agent system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要