Multi-Agent Game Abstraction via Graph Attention Neural Network

national conference on artificial intelligence, 2020.

Cited by: 2|Bibtex|Views132|Links
Keywords:
Markov decision processmulti agent reinforcement learninggraph neural networkneural networkinteraction relationshipMore(9+)
Weibo:
We propose a novel two-stage attention mechanism G2ANet for game abstraction, which can be combined with graph neural network

Abstract:

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to co...More

Code:

Data:

0
Introduction
  • Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents.
  • Recent work has focused on multi-agent reinforcement learning in large-scale multi-agent systems (Yang et al 2018; Chen et al 2018), in which the large number of Hard Attention Soft Attention.
  • Markov game, which is known as stochastic game, is widely adopted as the model of multi-agent reinforcement learning (MARL).
  • It can be treated as the extension of Markov decision process (MDP) to multi-agent setting.
  • Definition 1 An n-agent (n ≥ 2) Markov game is a tuple N, S, {Ai}ni=1, {Ri}ni=1, T , where N is the set of agents, S is the state space, Ai is the action space of agent i(i=1,...,n).
Highlights
  • Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents
  • Game Abstraction Applications Actor-Critic Network agents and the complexity of interactions pose a significant challenge to the policy learning process
  • We propose a novel two-stage attention mechanism G2ANet for game abstraction, which can be combined with graph neural network (GNN)
  • Inspired by attention mechanism (Bahdanau, Cho, and Bengio 2014; Ba, Mnih, and Kavukcuoglu 2014; Mnih et al 2014; Xu et al 2015; Vaswani et al 2017), we firstly propose the two-stage attention game abstraction algorithm called G2ANet, which learns the interaction relationship between agents through hard-attention and soft-attention mechanisms
  • We focus on the simplification of policy learning in large-scale multi-agent systems
  • Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms
Methods
  • The authors propose a novel game abstraction approach based on two-stage attention mechanism (G2ANet).
  • G2ANet: Game Abstraction Based on Two-Stage Attention.
  • The authors define the graph as AgentCoordination Graph.
  • Definition 2 (Agent-Coordination Graph) The relationship between agents is defined as an undirected graph as G = (N, E), consisting of the set N of nodes and the set E of edges, which are unordered pairs of elements of N.
  • Each node represents the agent entry, and the edge represents the relationship between the two adjacent agents
Results
  • The authors can find that the success rate of the method is about 6%, 7% and 11% higher than IC3Net in the three levels, which verifies that the method is more effective (67-11) as the difficulty of environment gradually increases.
Conclusion
  • The authors focus on the simplification of policy learning in large-scale multi-agent systems.
  • The authors learn the relationship between agents and achieve game abstraction by defining a novel attention mechanism.
  • At different time steps in an episode, the relationship between agents is constantly changing.
  • The authors can learn the adaptive and dynamic attention value.
  • The authors' major contributions include the novel two-stage attention mechanism G2ANet, and the two game abstraction based learning algorithms GAComm and GA-AC.
  • Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms
Summary
  • Introduction:

    Multi-agent reinforcement learning (MARL) has shown a great success for solving sequential decision-making problems with multiple agents.
  • Recent work has focused on multi-agent reinforcement learning in large-scale multi-agent systems (Yang et al 2018; Chen et al 2018), in which the large number of Hard Attention Soft Attention.
  • Markov game, which is known as stochastic game, is widely adopted as the model of multi-agent reinforcement learning (MARL).
  • It can be treated as the extension of Markov decision process (MDP) to multi-agent setting.
  • Definition 1 An n-agent (n ≥ 2) Markov game is a tuple N, S, {Ai}ni=1, {Ri}ni=1, T , where N is the set of agents, S is the state space, Ai is the action space of agent i(i=1,...,n).
  • Methods:

    The authors propose a novel game abstraction approach based on two-stage attention mechanism (G2ANet).
  • G2ANet: Game Abstraction Based on Two-Stage Attention.
  • The authors define the graph as AgentCoordination Graph.
  • Definition 2 (Agent-Coordination Graph) The relationship between agents is defined as an undirected graph as G = (N, E), consisting of the set N of nodes and the set E of edges, which are unordered pairs of elements of N.
  • Each node represents the agent entry, and the edge represents the relationship between the two adjacent agents
  • Results:

    The authors can find that the success rate of the method is about 6%, 7% and 11% higher than IC3Net in the three levels, which verifies that the method is more effective (67-11) as the difficulty of environment gradually increases.
  • Conclusion:

    The authors focus on the simplification of policy learning in large-scale multi-agent systems.
  • The authors learn the relationship between agents and achieve game abstraction by defining a novel attention mechanism.
  • At different time steps in an episode, the relationship between agents is constantly changing.
  • The authors can learn the adaptive and dynamic attention value.
  • The authors' major contributions include the novel two-stage attention mechanism G2ANet, and the two game abstraction based learning algorithms GAComm and GA-AC.
  • Experimental results in Traffic Junction and Predator-Prey show that with the novel game abstraction mechanism, the GA-Comm and GA-AC algorithms can get better performance compared with state-of-the-art algorithms
Tables
  • Table1: Success Rate in the Traffic Junction
Download tables as Excel
Funding
  • This work is supported by Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project No(2018AAA0100905), the National Natural Science Foundation of China (Nos.: 61432008, 61702362, U1836214, 61403208), the Collaborative Innovation Center of Novel Software Technology and Industrialization
Reference
  • [Ba, Mnih, and Kavukcuoglu] Ba, J.; Mnih, V.; and Kavukcuoglu, K. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755.
    Findings
  • [Bahdanau, Cho, and Bengio] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Findings
  • [Chen et al.] Chen, Y.; Zhou, M.; Wen, Y.; Yang, Y.; Su, Y.; Zhang, W.; Zhang, D.; Wang, J.; and Liu, H. 2018. Factorized q-learning for large-scale multi-agent systems. arXiv preprint arXiv:1809.03738.
    Findings
  • [De Hauwere, Vrancx, and Nowe] De Hauwere, Y.-M.; Vrancx, P.; and Nowe, A. 2010. Learning multi-agent state space representations. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 715–722.
    Google ScholarLocate open access versionFindings
  • [Foerster et al.] Foerster, J. N.; Farquhar, G.; Afouras, T.; Nardelli, N.; and Whiteson, S. 2018. Counterfactual multiagent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • [Guestrin, Lagoudakis, and Parr] Guestrin, C.; Lagoudakis, M. G.; and Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the 9th International Conference on Machine Learning, 227–234.
    Google ScholarLocate open access versionFindings
  • [Hu, Gao, and An] Hu, Y.; Gao, Y.; and An, B. 2015. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 753–761.
    Google ScholarLocate open access versionFindings
  • [Iqbal and Sha] Iqbal, S., and Sha, F. 2019. Actor-attentioncritic for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, 2961–2970.
    Google ScholarLocate open access versionFindings
  • [Jang, Gu, and Poole] Jang, E.; Gu, S.; and Poole, B. 2017. Categorical reparameterization with gumbel-softmax. In 5th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • [Jiang and Lu] Jiang, J., and Lu, Z. 2018. Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems, 7254– 7264.
    Google ScholarLocate open access versionFindings
  • [Jiang, Dun, and Lu] Jiang, J.; Dun, C.; and Lu, Z. 2018. Graph convolutional reinforcement learning for multi-agent cooperation. arXiv preprint arXiv:1810.09202.
    Findings
  • [Kok and Vlassis] Kok, J. R., and Vlassis, N. A. 2004. Sparse cooperative Q-learning. In Proceedings of the 21st International Conference on Machine Learning, 61–68.
    Google ScholarLocate open access versionFindings
  • [Liu et al.] Liu, Y.; Hu, Y.; Gao, Y.; Chen, Y.; and Fan, C. 2019. Value function transfer for deep multi-agent reinforcement learning based on n-step returns. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 457–463.
    Google ScholarLocate open access versionFindings
  • [Lowe et al.] Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O. P.; and Mordatch, I. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, 6379–6390.
    Google ScholarLocate open access versionFindings
  • [Melo and Veloso] Melo, F. S., and Veloso, M. M. 2011. Decentralized MDPs with sparse interactions. Artifitial Intelligence 175(11):1757–1789.
    Google ScholarLocate open access versionFindings
  • [Mnih et al.] Mnih, V.; Heess, N.; Graves, A.; et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems, 2204–2212.
    Google ScholarLocate open access versionFindings
  • [Mnih et al.] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928– 1937.
    Google ScholarLocate open access versionFindings
  • [Rashid et al.] Rashid, T.; Samvelyan, M.; de Witt, C. S.; Farquhar, G.; Foerster, J. N.; and Whiteson, S. 20QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, 4292–4301.
    Google ScholarLocate open access versionFindings
  • [Schulman et al.] Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
    Findings
  • [Singh, Jain, and Sukhbaatar] Singh, A.; Jain, T.; and Sukhbaatar, S. 2019. Learning when to communicate at scale in multiagent cooperative and competitive tasks. In 7th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • [Sukhbaatar, Fergus, and others] Sukhbaatar, S.; Fergus, R.; et al. 2016. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, 2244–2252.
    Google ScholarLocate open access versionFindings
  • [Sunehag et al.] Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W. M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J. Z.; Tuyls, K.; et al. 2018. Valuedecomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2085–2087.
    Google ScholarLocate open access versionFindings
  • [Vaswani et al.] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
    Google ScholarLocate open access versionFindings
  • [Wang et al.] Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803.
    Google ScholarLocate open access versionFindings
  • [Williams] Williams, R. J. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • [Xu et al.] Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; and Bengio, Y. 2015.
    Google ScholarFindings
  • [Yang et al.] Yang, Y.; Luo, R.; Li, M.; Zhou, M.; Zhang, W.; and Wang, J. 2018. Mean field multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, 5567–5576.
    Google ScholarLocate open access versionFindings
  • [Yu et al.] Yu, C.; Zhang, M.; Ren, F.; and Tan, G. 2015. Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Transactions on Cybernetics 45(12):2853–2867.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments