# Multi-Agent Game Abstraction via Graph Attention Neural Network

national conference on artificial intelligence, 2020.

Keywords:

Markov decision processmulti agent reinforcement learninggraph neural networkneural networkinteraction relationshipMore(9+)

Weibo:

Abstract:

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to co...More

Code:

Data:

Introduction

- Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents.
- Recent work has focused on multi-agent reinforcement learning in large-scale multi-agent systems (Yang et al 2018; Chen et al 2018), in which the large number of Hard Attention Soft Attention.
- Markov game, which is known as stochastic game, is widely adopted as the model of multi-agent reinforcement learning (MARL).
- It can be treated as the extension of Markov decision process (MDP) to multi-agent setting.
- Definition 1 An n-agent (n ≥ 2) Markov game is a tuple N, S, {Ai}ni=1, {Ri}ni=1, T , where N is the set of agents, S is the state space, Ai is the action space of agent i(i=1,...,n).

Highlights

- Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents
- Game Abstraction Applications Actor-Critic Network agents and the complexity of interactions pose a significant challenge to the policy learning process
- We propose a novel two-stage attention mechanism G2ANet for game abstraction, which can be combined with graph neural network (GNN)
- Inspired by attention mechanism (Bahdanau, Cho, and Bengio 2014; Ba, Mnih, and Kavukcuoglu 2014; Mnih et al 2014; Xu et al 2015; Vaswani et al 2017), we firstly propose the two-stage attention game abstraction algorithm called G2ANet, which learns the interaction relationship between agents through hard-attention and soft-attention mechanisms
- We focus on the simplification of policy learning in large-scale multi-agent systems
- Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms

Methods

- The authors propose a novel game abstraction approach based on two-stage attention mechanism (G2ANet).
- G2ANet: Game Abstraction Based on Two-Stage Attention.
- The authors define the graph as AgentCoordination Graph.
- Definition 2 (Agent-Coordination Graph) The relationship between agents is defined as an undirected graph as G = (N, E), consisting of the set N of nodes and the set E of edges, which are unordered pairs of elements of N.
- Each node represents the agent entry, and the edge represents the relationship between the two adjacent agents

Results

- The authors can find that the success rate of the method is about 6%, 7% and 11% higher than IC3Net in the three levels, which verifies that the method is more effective (67-11) as the difficulty of environment gradually increases.

Conclusion

- The authors focus on the simplification of policy learning in large-scale multi-agent systems.
- The authors learn the relationship between agents and achieve game abstraction by defining a novel attention mechanism.
- At different time steps in an episode, the relationship between agents is constantly changing.
- The authors can learn the adaptive and dynamic attention value.
- The authors' major contributions include the novel two-stage attention mechanism G2ANet, and the two game abstraction based learning algorithms GAComm and GA-AC.
- Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms

Summary

## Introduction:

Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents.- Recent work has focused on multi-agent reinforcement learning in large-scale multi-agent systems (Yang et al 2018; Chen et al 2018), in which the large number of Hard Attention Soft Attention.
- Markov game, which is known as stochastic game, is widely adopted as the model of multi-agent reinforcement learning (MARL).
- It can be treated as the extension of Markov decision process (MDP) to multi-agent setting.
- Definition 1 An n-agent (n ≥ 2) Markov game is a tuple N, S, {Ai}ni=1, {Ri}ni=1, T , where N is the set of agents, S is the state space, Ai is the action space of agent i(i=1,...,n).
## Methods:

The authors propose a novel game abstraction approach based on two-stage attention mechanism (G2ANet).- G2ANet: Game Abstraction Based on Two-Stage Attention.
- The authors define the graph as AgentCoordination Graph.
- Definition 2 (Agent-Coordination Graph) The relationship between agents is defined as an undirected graph as G = (N, E), consisting of the set N of nodes and the set E of edges, which are unordered pairs of elements of N.
- Each node represents the agent entry, and the edge represents the relationship between the two adjacent agents
## Results:

The authors can find that the success rate of the method is about 6%, 7% and 11% higher than IC3Net in the three levels, which verifies that the method is more effective (67-11) as the difficulty of environment gradually increases.## Conclusion:

The authors focus on the simplification of policy learning in large-scale multi-agent systems.- The authors learn the relationship between agents and achieve game abstraction by defining a novel attention mechanism.
- At different time steps in an episode, the relationship between agents is constantly changing.
- The authors can learn the adaptive and dynamic attention value.
- The authors' major contributions include the novel two-stage attention mechanism G2ANet, and the two game abstraction based learning algorithms GAComm and GA-AC.
- Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms

- Table1: Success Rate in the Traffic Junction

Funding

- This work is supported by Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project No(2018AAA0100905), the National Natural Science Foundation of China (Nos.: 61432008, 61702362, U1836214, 61403208), the Collaborative Innovation Center of Novel Software Technology and Industrialization

Reference

- [Ba, Mnih, and Kavukcuoglu] Ba, J.; Mnih, V.; and Kavukcuoglu, K. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755.
- [Bahdanau, Cho, and Bengio] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- [Chen et al.] Chen, Y.; Zhou, M.; Wen, Y.; Yang, Y.; Su, Y.; Zhang, W.; Zhang, D.; Wang, J.; and Liu, H. 2018. Factorized q-learning for large-scale multi-agent systems. arXiv preprint arXiv:1809.03738.
- [De Hauwere, Vrancx, and Nowe] De Hauwere, Y.-M.; Vrancx, P.; and Nowe, A. 2010. Learning multi-agent state space representations. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 715–722.
- [Foerster et al.] Foerster, J. N.; Farquhar, G.; Afouras, T.; Nardelli, N.; and Whiteson, S. 2018. Counterfactual multiagent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence.
- [Guestrin, Lagoudakis, and Parr] Guestrin, C.; Lagoudakis, M. G.; and Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the 9th International Conference on Machine Learning, 227–234.
- [Hu, Gao, and An] Hu, Y.; Gao, Y.; and An, B. 2015. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 753–761.
- [Iqbal and Sha] Iqbal, S., and Sha, F. 2019. Actor-attentioncritic for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, 2961–2970.
- [Jang, Gu, and Poole] Jang, E.; Gu, S.; and Poole, B. 2017. Categorical reparameterization with gumbel-softmax. In 5th International Conference on Learning Representations.
- [Jiang and Lu] Jiang, J., and Lu, Z. 2018. Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems, 7254– 7264.
- [Jiang, Dun, and Lu] Jiang, J.; Dun, C.; and Lu, Z. 2018. Graph convolutional reinforcement learning for multi-agent cooperation. arXiv preprint arXiv:1810.09202.
- [Kok and Vlassis] Kok, J. R., and Vlassis, N. A. 2004. Sparse cooperative Q-learning. In Proceedings of the 21st International Conference on Machine Learning, 61–68.
- [Liu et al.] Liu, Y.; Hu, Y.; Gao, Y.; Chen, Y.; and Fan, C. 2019. Value function transfer for deep multi-agent reinforcement learning based on n-step returns. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 457–463.
- [Lowe et al.] Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O. P.; and Mordatch, I. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, 6379–6390.
- [Melo and Veloso] Melo, F. S., and Veloso, M. M. 2011. Decentralized MDPs with sparse interactions. Artifitial Intelligence 175(11):1757–1789.
- [Mnih et al.] Mnih, V.; Heess, N.; Graves, A.; et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems, 2204–2212.
- [Mnih et al.] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928– 1937.
- [Rashid et al.] Rashid, T.; Samvelyan, M.; de Witt, C. S.; Farquhar, G.; Foerster, J. N.; and Whiteson, S. 20QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, 4292–4301.
- [Schulman et al.] Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- [Singh, Jain, and Sukhbaatar] Singh, A.; Jain, T.; and Sukhbaatar, S. 2019. Learning when to communicate at scale in multiagent cooperative and competitive tasks. In 7th International Conference on Learning Representations.
- [Sukhbaatar, Fergus, and others] Sukhbaatar, S.; Fergus, R.; et al. 2016. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, 2244–2252.
- [Sunehag et al.] Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W. M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J. Z.; Tuyls, K.; et al. 2018. Valuedecomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2085–2087.
- [Vaswani et al.] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
- [Wang et al.] Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803.
- [Williams] Williams, R. J. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
- [Xu et al.] Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; and Bengio, Y. 2015.
- [Yang et al.] Yang, Y.; Luo, R.; Li, M.; Zhou, M.; Zhang, W.; and Wang, J. 2018. Mean field multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, 5567–5576.
- [Yu et al.] Yu, C.; Zhang, M.; Ren, F.; and Tan, G. 2015. Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Transactions on Cybernetics 45(12):2853–2867.

Tags

Comments