Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

ICLR, 2020.

Cited by: 0|Bibtex|Views129|Links
EI
Keywords:
multiagent coordination multiagent learning
Weibo:
We propose a new network architecture, Action Semantics Network, to facilitate more efficient multiagent learning by explicitly investigating the semantics of actions between agents

Abstract:

In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent's selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponential...More
Introduction
  • Palmer et al (2018) extended the idea of leniency (Potter & Jong, 1994; Panait et al, 2008) to deep MARL and proposed the retroactive temperature decay schedule to address stochastic rewards problems
  • All these works ignore the natural property of the action influence between agents, which we aim to exploit to facilitate multiagent coordination.
Highlights
  • Deep reinforcement learning (DRL) (Sutton & Barto, 2018) has achieved a lot of success at finding optimal policies to address single-agent complex tasks (Mnih et al, 2015; Lillicrap et al, 2016; Silver et al, 2017)
  • A partially observable stochastic game (POSG) is defined as a tuple N , S, A1, · · · , An, T , R1, · · · , Rn, O1, · · · , On , where N is the set of agents; S is the set of states; Ai is the set of actions available to agent i; T is the transition function
  • We describe how Action Semantics Network can be incorporated into existing deep multiagent reinforcement learning, which can be classified into two paradigms: Independent Learner (IL) (Mnih et al, 2015; Schulman et al, 2017) and Joint Action Learner (JAL) (Lowe et al, 2017; Rashid et al, 2018; Foerster et al, 2018)
  • We evaluate the performance of Action Semantics Network compared with different network structures including the vanilla network, the dueling network (Wang et al, 2016), the attention network that expects to learn which information should be focused on more automatically and entity-attention network under various Deep reinforcement learning approaches
  • We propose a new network architecture, Action Semantics Network, to facilitate more efficient multiagent learning by explicitly investigating the semantics of actions between agents
  • Action Semantics Network greatly improves the performance of state-of-the-art Deep reinforcement learning methods compared with a number of network architectures
Conclusion
  • We propose a new network architecture, ASN, to facilitate more efficient multiagent learning by explicitly investigating the semantics of actions between agents.
  • To the best of our knowledge, ASN is the first to explicitly characterize the action semantics in MASs, which can be combined with various multiagent DRL algorithms to boost the learning performance.
  • It is worth investigating how to model the action semantics among more than two agents
  • Another interesting direction is to consider the action semantics between agents in continuous action spaces
Summary
  • Introduction:

    Palmer et al (2018) extended the idea of leniency (Potter & Jong, 1994; Panait et al, 2008) to deep MARL and proposed the retroactive temperature decay schedule to address stochastic rewards problems
  • All these works ignore the natural property of the action influence between agents, which we aim to exploit to facilitate multiagent coordination.
  • Conclusion:

    We propose a new network architecture, ASN, to facilitate more efficient multiagent learning by explicitly investigating the semantics of actions between agents.
  • To the best of our knowledge, ASN is the first to explicitly characterize the action semantics in MASs, which can be combined with various multiagent DRL algorithms to boost the learning performance.
  • It is worth investigating how to model the action semantics among more than two agents
  • Another interesting direction is to consider the action semantics between agents in continuous action spaces
Tables
  • Table1: PCT of choosing a valid action for ASN-QMIX and vanilla-QMIX
  • Table2: Hyperparameter settings for StarCraft II
  • Table3: Parameters of all algorithms
Download tables as Excel
Funding
  • This work is supported by the National Natural Science Foundation of China (Grant Nos.: 61702362, U1836214, 61432008) and Science and Technology Innovation 2030 - “New Generation Artificial Intelligence” Major Project No(2018AAA0100905)
Reference
  • Lucian Bu, Robert Babu, Bart De Schutter, et al. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
    Google ScholarLocate open access versionFindings
  • Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence Conference, pp. 746–752, 1998.
    Google ScholarLocate open access versionFindings
  • Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems, Workshops, pp. 66–83, 2017.
    Google ScholarLocate open access versionFindings
  • Eric A Hansen, Daniel S Bernstein, and Shlomo Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, volume 4, pp. 709–715, 2004.
    Google ScholarLocate open access versionFindings
  • Yann-Michael De Hauwere, Sam Devlin, Daniel Kudenko, and Ann Nowe. Context-sensitive reward shaping for sparse interaction multi-agent systems. Knowledge Eng. Review, 31(1):59–76, 2016.
    Google ScholarLocate open access versionFindings
  • Junling Hu and Michael P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 242–250, 1998.
    Google ScholarLocate open access versionFindings
  • Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163, 1994.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International conference on machine learning, pp. 1928– 1937, 2016.
    Google ScholarLocate open access versionFindings
  • Kwang-Kyo Oh, Myoung-Chul Park, and Hyo-Sung Ahn. A survey of multi-agent formation control. Automatica, 53:424–440, 2015.
    Google ScholarLocate open access versionFindings
  • OpenAI. Openai five. https://blog.openai.com/openai-five/.
    Findings
  • Gregory Palmer, Karl Tuyls, Daan Bloembergen, and Rahul Savani. Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451, 2018.
    Google ScholarLocate open access versionFindings
  • Liviu Panait, Karl Tuyls, and Sean Luke. Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. J. Mach. Learn. Res., 9:423–457, 2008.
    Google ScholarLocate open access versionFindings
  • Mitchell A. Potter and Kenneth A. De Jong. A cooperative coevolutionary approach to function optimization. In Proceedings of International Conference on Evolutionary Computation, pp. 249– 257, 1994.
    Google ScholarLocate open access versionFindings
  • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pp. 4292–4301, 2018.
    Google ScholarLocate open access versionFindings
  • Mikayel Samvelyan, Tabish Rashid, Christian Schroder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob N. Foerster, and Shimon Whiteson. The starcraft multi-agent challenge. pp. 2186–2188, 2019.
    Google ScholarFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
    Google ScholarLocate open access versionFindings
  • Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Individualized controlled continuous communication model for multiagent cooperative and competitive tasks. In Proceedings of the 7th International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Adrian Sosic, Wasiur R. KhudaBukhsh, Abdelhak M. Zoubir, and Heinz Koeppl. Inverse reinforcement learning in swarm systems. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1413–1421, 2017.
    Google ScholarLocate open access versionFindings
  • H Eugene Stanley. Phase transitions and critical phenomena. Clarendon Press, Oxford, 1971. Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch. Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784, 2019. Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pp. 2244–2252, 2016. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087, 2018. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Findings
  • State Description In StarCraft II, we follow the settings of previous works (Rashid et al., 2018; Samvelyan et al., 2019). The local observation of each agent is drawn within their field of view, which encompasses the circular area of the map surrounding units and has a radius equal to the sight range. Each agent receives as input a vector consisting of the following features for all units in its field of view (both allied and enemy): distance, relative x, relative y, and unit type. More details can be found at https://github.com/MAS-anony/ASN or https://github.com/oxwhirl/smac.
    Locate open access versionFindings
Your rating :
0

 

Tags
Comments