Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Cited by: 4|Bibtex|Views63|Links
Keywords:
centralized training with decentralized executionmultiagent reinforcement learningvalue decompositionq valuecooperative multi agentMore(8+)
Weibo:
We propose the novel Q-value Attention network for the multiagent Q-value decomposition problem

Abstract:

In many real-world settings, a team of cooperative agents must learn to coordinate their behavior with private observations and communication constraints. Deep multiagent reinforcement learning algorithms (Deep-MARL) have shown superior performance in these realistic and difficult problems but still suffer from challenges. One branch is...More

Code:

Data:

0
Introduction
  • Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018).
  • One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task.
  • In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993).
Highlights
  • Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018)
  • One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task. In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993)
  • We propose the novel Q-value Attention network for the multiagent Q-value decomposition problem
  • To approximate each term of the decomposition formula, we introduce multi-head attention to establish the mixing network
  • Q-value Attention network (Qatten) could be enhanced by weighted head Q-values
  • Experiments show that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights
  • Experiments on the standard MARL benchmark show that our method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi
Results
  • Experiments show that the method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights.
Conclusion
  • Conclusion and Future Work

    In this paper, the authors propose the novel Q-value Attention network for the multiagent Q-value decomposition problem.
  • To approximate each term of the decomposition formula, the authors introduce multi-head attention to establish the mixing network.
  • Experiments on the standard MARL benchmark show that the method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi. For future work, improving Qatten by combining with explicit exploration mechanism on difficult MARL tasks is a straightforward direction.
  • Incorporating recent progresses of attention to adapt Qatten into large-scale settings where hundreds of agents exist is promising
Summary
  • Introduction:

    Cooperative multiagent reinforcement learning problem has been studied extensively in the last decade, where a system of agents learn towards coordinated policies to optimize the accumulated global rewards (Busoniu et al, 2008; Gupta et al, 2017; Palmer et al, 2018).
  • One natural way to address cooperative MARL problem is the centralized approach, which views the multiagent system (MAS) as a whole and solves it as a single-agent learning task.
  • In such settings, existing reinforcement learning (RL) techniques can be leveraged to learn joint optimal policies based on agents joint observations and common rewards (Tan, 1993).
  • Results:

    Experiments show that the method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is investigated with sights.
  • Conclusion:

    Conclusion and Future Work

    In this paper, the authors propose the novel Q-value Attention network for the multiagent Q-value decomposition problem.
  • To approximate each term of the decomposition formula, the authors introduce multi-head attention to establish the mixing network.
  • Experiments on the standard MARL benchmark show that the method obtains the best performance on almost all maps and the attention analysis gives the intuitive explanations about the weights of each agent while approximating Qtot and, to some degree, reveals the internal workflow of the Qtot’s approximation from Qi. For future work, improving Qatten by combining with explicit exploration mechanism on difficult MARL tasks is a straightforward direction.
  • Incorporating recent progresses of attention to adapt Qatten into large-scale settings where hundreds of agents exist is promising
Tables
  • Table1: Maps in hard and super hard scenarios
  • Table2: Median performance of the test win percentage
  • Table3: The network configurations of Qatten’s mixing network
Download tables as Excel
Reference
  • Busoniu, L., Babuska, R., and De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, 38(2): 156–172, 2008.
    Google ScholarLocate open access versionFindings
  • Cao, Y., Yu, W., Ren, W., and Chen, G. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 9(1):427–438, 2012.
    Google ScholarLocate open access versionFindings
  • Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Graves, A., Wayne, G., and Danihelka, I. Neural Turing Machines. arXiv:1410.5401 [cs], December 201URL http://arxiv.org/abs/1410.5401.arXiv:1410.5401.
    Findings
  • Gupta, J. K., Egorov, M., and Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems, pp. 66–83, 2017.
    Google ScholarLocate open access versionFindings
  • Littman, M. L. Markov games as a framework for multiagent reinforcement learning. In Machine Learning Proceedings, pp. 157–163.
    Google ScholarLocate open access versionFindings
  • Elsevier, 1994. doi: 10.1016/ B978-1-55860-335-6.50027-1.
    Google ScholarFindings
  • Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the 31th Advances in Neural Information Processing Systems, pp. 6379–6390. Curran Associates, Inc., 2017.
    Google ScholarLocate open access versionFindings
  • Mahajan, A., Rashid, T., Samvelyan, M., and Whiteson, S. MAVEN: Multi-Agent Variational Exploration. In Wallach, H., Larochelle, H., Beygelzimer, A., Alch-Buc, F. d., Fox, E., and Garnett, R. (eds.), Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 7611–7622. Curran Associates, Inc., 2019.
    Google ScholarLocate open access versionFindings
  • Matignon, L., Laurent, G. J., and Le Fort-Piat, N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1):1–31, 2012.
    Google ScholarLocate open access versionFindings
  • Oh, J., Chockalingam, V., Singh, S., and Lee, H. Control of memory, active perception, and action in minecraft. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML’16, pp. 2790–2799. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.3045684.
    Locate open access versionFindings
  • Palmer, G., Tuyls, K., Bloembergen, D., and Savani, R. Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451, 2018.
    Google ScholarLocate open access versionFindings
  • Rashid, T., Samvelyan, M., Witt, C. S. d., Farquhar, G., Foerster, J. N., and Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, pp. 4292–4301, 2018.
    Google ScholarLocate open access versionFindings
  • Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C.-M., Torr, P. H. S., Foerster, J., and Whiteson, S. The StarCraft MultiAgent Challenge. In arXiv:1902.04043 [cs, stat], February 2019. URL http://arxiv.org/abs/1902.04043.arXiv:1902.04043.
    Findings
  • Schroeder de Witt, C., Foerster, J., Farquhar, G., Torr, P., Boehmer, W., and Whiteson, S. Multi-Agent Common Knowledge Reinforcement Learning. In Wallach, H., Larochelle, H., Beygelzimer, A., Alch-Buc, F. d., Fox, E., and Garnett, R. (eds.), Proceedings of the Advances in Neural Information Processing Systems 32, pp. 9924– 9935. Curran Associates, Inc., 2019.
    Google ScholarLocate open access versionFindings
  • Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5887–5896, Long Beach, California, USA, June 2019. PMLR.
    Google ScholarLocate open access versionFindings
  • Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., and Graepel, T. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pp. 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems. URL http://dl.acm.org/citation.cfm?id=3237383.3238080.
    Locate open access versionFindings
  • Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In In Proceedings of the 10th International Conference on Machine Learning, pp. 330– 337. Morgan Kaufmann, 1993.
    Google ScholarLocate open access versionFindings
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is All you Need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
    Locate open access versionFindings
  • Ying, W. and Sang, D. Multi-agent framework for third party logistics in e-commerce. Expert Systems with Applications, 29(2):431–436, 2005.
    Google ScholarLocate open access versionFindings
  • Yun, C., Bhojanapalli, S., Rawat, A. S., Reddi, S. J., and Kumar, S. Are transformers universal approximators of sequence-to-sequence functions? In Proceedings of the 8th International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • 0. Now that gradient We follow the settings of SMAC (Samvelyan et al., 2019), which could be referred in the SMAC paper. For clarity and completeness, we state these environment details again.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments