DVF: Multi-agent Q-learning with difference value factorization

Anqi Huang,Yongli Wang,Jianghui Sang,Xiaoli Wang,Yupeng Wang

Knowledge-Based Systems（2024）

引用 0|浏览0

暂无评分

摘要

In cooperative multi-agent games, agents are required to learn effective cooperative behaviors within complex action spaces. An effective approach is to utilize the Individual-Global-Max (IGM) principle to decompose the global reward signal into individual contributions from each agent. However, existing methods confront two significant challenges: ensuring the accuracy of the value function decomposition process and guaranteeing monotonic improvement in joint policies. These challenges become exacerbated in scenarios involving non-monotonic matrices. To address these challenges, we introduce a novel and flexible value factorization method called Difference Value Factorization (DVF). The key idea of our method is to transform the IGM principle into a new form DVF-IGM, which addresses non-monotonic constraints by ensuring the consistency of the IGM process between the joint difference value and the complex non-linear sum of independent difference values. A centralized evaluator is employed to estimate global Q-values, which not only enhances expressiveness but also constructs difference values for updating individual value functions. We demonstrate that DVF-IGM is an equivalent transformation of IGM and that DVF has the monotonic improvement property. Empirically, our method has been shown to maintain and recover the optimal policy in non-monotonic matrix games and achieve state-of-the-art performance in cooperative tasks within the StarCraft Multi-Agent Challenge (SMAC). • We transforms the IGM principle into a new form DVF-IGM. • We employ a centralized evaluator to enhance expressiveness. • We can maintain IGM consistency in non-monotonic payoff settings.

查看译文

关键词

Multi-agent reinforcement learning,Value factorization,Individual-global-max,Reinforcement learning,Multi-agent system

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要