# Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Keywords:

centralized training with decentralized executionmultiagent reinforcement learningvalue decompositionq valuecooperative multi agentMore(8+)

Weibo:

Abstract:

In many real-world settings, a team of cooperative agents must learn to coordinate their behavior with private observations and communication constraints. Deep multiagent reinforcement learning algorithms (Deep-MARL) have shown superior performance in these realistic and difficult problems but still suffer from challenges. One branch is...More

Code:

Data:

Introduction

- Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018).
- One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task.
- In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993).

Highlights

- Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018)
- One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task. In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993)
- We propose the novel Q-value Attention network for the multiagent Q-value decomposition problem
- To approximate each term of the decomposition formula, we introduce multi-head attention to establish the mixing network
- Q-value Attention network (Qatten) could be enhanced by weighted head Q-values
- Experiments show that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights
- Experiments on the standard MARL benchmark show that our method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi

Results

- Experiments show that the method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights.

Conclusion

**Conclusion and Future Work**

In this paper, the authors propose the novel Q-value Attention network for the multiagent Q-value decomposition problem.- To approximate each term of the decomposition formula, the authors introduce multi-head attention to establish the mixing network.
- Experiments on the standard MARL benchmark show that the method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi. For future work, improving Qatten by combining with explicit exploration mechanism on difficult MARL tasks is a straightforward direction.
- Incorporating recent progresses of attention to adapt Qatten into large-scale settings where hundreds of agents exist is promising

Summary

## Introduction:

Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018).- One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task.
- In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993).
## Results:

Experiments show that the method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights.## Conclusion:

**Conclusion and Future Work**

In this paper, the authors propose the novel Q-value Attention network for the multiagent Q-value decomposition problem.- To approximate each term of the decomposition formula, the authors introduce multi-head attention to establish the mixing network.
- Experiments on the standard MARL benchmark show that the method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi. For future work, improving Qatten by combining with explicit exploration mechanism on difficult MARL tasks is a straightforward direction.
- Incorporating recent progresses of attention to adapt Qatten into large-scale settings where hundreds of agents exist is promising

- Table1: Maps in hard and super hard scenarios
- Table2: Median performance of the test win percentage
- Table3: The network configurations of Qatten’s mixing network

Reference

- Busoniu, L., Babuska, R., and De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, 38(2): 156–172, 2008.
- Cao, Y., Yu, W., Ren, W., and Chen, G. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 9(1):427–438, 2012.
- Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
- Graves, A., Wayne, G., and Danihelka, I. Neural Turing Machines. arXiv:1410.5401 [cs], December 201URL http://arxiv.org/abs/1410.5401.arXiv:1410.5401.
- Gupta, J. K., Egorov, M., and Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems, pp. 66–83, 2017.
- Littman, M. L. Markov games as a framework for multiagent reinforcement learning. In Machine Learning Proceedings, pp. 157–163.
- Elsevier, 1994. doi: 10.1016/ B978-1-55860-335-6.50027-1.
- Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the 31th Advances in Neural Information Processing Systems, pp. 6379–6390. Curran Associates, Inc., 2017.
- Mahajan, A., Rashid, T., Samvelyan, M., and Whiteson, S. MAVEN: Multi-Agent Variational Exploration. In Wallach, H., Larochelle, H., Beygelzimer, A., Alch-Buc, F. d., Fox, E., and Garnett, R. (eds.), Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 7611–7622. Curran Associates, Inc., 2019.
- Matignon, L., Laurent, G. J., and Le Fort-Piat, N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1):1–31, 2012.
- Oh, J., Chockalingam, V., Singh, S., and Lee, H. Control of memory, active perception, and action in minecraft. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML’16, pp. 2790–2799. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.3045684.
- Palmer, G., Tuyls, K., Bloembergen, D., and Savani, R. Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451, 2018.
- Rashid, T., Samvelyan, M., Witt, C. S. d., Farquhar, G., Foerster, J. N., and Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, pp. 4292–4301, 2018.
- Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C.-M., Torr, P. H. S., Foerster, J., and Whiteson, S. The StarCraft MultiAgent Challenge. In arXiv:1902.04043 [cs, stat], February 2019. URL http://arxiv.org/abs/1902.04043.arXiv:1902.04043.
- Schroeder de Witt, C., Foerster, J., Farquhar, G., Torr, P., Boehmer, W., and Whiteson, S. Multi-Agent Common Knowledge Reinforcement Learning. In Wallach, H., Larochelle, H., Beygelzimer, A., Alch-Buc, F. d., Fox, E., and Garnett, R. (eds.), Proceedings of the Advances in Neural Information Processing Systems 32, pp. 9924– 9935. Curran Associates, Inc., 2019.
- Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5887–5896, Long Beach, California, USA, June 2019. PMLR.
- Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., and Graepel, T. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pp. 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems. URL http://dl.acm.org/citation.cfm?id=3237383.3238080.
- Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In In Proceedings of the 10th International Conference on Machine Learning, pp. 330– 337. Morgan Kaufmann, 1993.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is All you Need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
- Ying, W. and Sang, D. Multi-agent framework for third party logistics in e-commerce. Expert Systems with Applications, 29(2):431–436, 2005.
- Yun, C., Bhojanapalli, S., Rawat, A. S., Reddi, S. J., and Kumar, S. Are transformers universal approximators of sequence-to-sequence functions? In Proceedings of the 8th International Conference on Learning Representations, 2020.
- 0. Now that gradient We follow the settings of SMAC (Samvelyan et al., 2019), which could be referred in the SMAC paper. For clarity and completeness, we state these environment details again.

Tags

Comments