Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework

Cited by: 1|Bibtex|Views116|Links
Keywords:
Successor Representation OptionPolicy GradientGraph Neural NetworkDeep Q-NetworkPartially Observable Stochastic GameMore(13+)
Weibo:
We propose a novel multiagent transfer learning framework for efficient multiagent learning by taking advantage of option-based policy transfer

Abstract:

Transfer Learning has shown great potential to enhance the single-agent Reinforcement Learning (RL) efficiency, by sharing previously learned policies. Inspired by this, the team learning performance in multiagent settings can be potentially promoted with agents reusing knowledge between each other when all agents interact with the envi...More

Code:

Data:

0
Introduction
  • Recent advance in Deep Reinforcement Learning (DRL) has obtained expressive success of achieving human-level control in complex tasks [Mnih et al, 2015; Lillicrap et al, 2016; Mnih et al, 2016].
  • DRL is still faced with sample inefficiency problems which makes it difficult to learn from scratch.
  • This situation becomes worse in multiagent systems (MASs) due to the exponential increase in the state-action space.
  • One major direction of works focused on transferring knowledge across multiagent tasks to accelerate multiagent reinforcement learning (MARL).
  • Wang et al [2020] proposed a dynamic multiagent curriculum learning for large-scale multiagent learning, where three kinds of transfer mechanisms are proposed for transferring knowledge across curricula
Highlights
  • Recent advance in Deep Reinforcement Learning (DRL) has obtained expressive success of achieving human-level control in complex tasks [Mnih et al, 2015; Lillicrap et al, 2016; Mnih et al, 2016]
  • We propose a novel option learning algorithm, the successor representation option (SRO) learning, that decouples the dynamics of the environment from the rewards to learn the option-value function under each agent’s preference
  • We evaluate the performance of our multioption transfer framework compared with vanilla singleagent DRL algorithms (PPO [Schulman et al, 2017])
  • Since in Pac-Man, two ghosts do not contain any inconsistency in their reward functions, we evaluate the performance of Multiagent Option-based Policy Transfer (MAOPT) compared with independent PPO learning from scratch
  • We propose a novel multiagent transfer learning framework (MAOPT) for efficient multiagent learning by taking advantage of option-based policy transfer
  • To address the problem of sample conflicts, we propose a novel option learning framework, the successor representation option (SRO) framework, that decouples the dynamics of the environment from the rewards to learn the option-value function under each agent’s preference
Results
  • The authors evaluate the performance of the multioption transfer framework compared with vanilla singleagent DRL algorithms (PPO [Schulman et al, 2017]).
  • Figure 4 presents the layout of the Pac-Man [van der Ouderaa, 2016], which is a competitive maze game with one pacman player and two ghost players.
  • The goal of the pac-man player is to eat as many pills as possible and avoid the pursuit of ghost players.
  • For ghost players, they aim to capture the pac-man player as soon as possible.
  • Each player receives −0.01 penalty each step and +5 reward for catching the pac-man player
Conclusion
  • The authors propose a novel multiagent transfer learning framework (MAOPT) for efficient multiagent learning by taking advantage of option-based policy transfer.
  • The authors' framework can be combined with existing DRL approaches.
  • Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art DRL methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.
  • As for future work, it’s worth investigating how to integrate explicit coordination mechanisms, e.g., coordinated exploration and credit assignment into MAOPT to facilitate multiagent coordination
Summary
  • Introduction:

    Recent advance in Deep Reinforcement Learning (DRL) has obtained expressive success of achieving human-level control in complex tasks [Mnih et al, 2015; Lillicrap et al, 2016; Mnih et al, 2016].
  • DRL is still faced with sample inefficiency problems which makes it difficult to learn from scratch.
  • This situation becomes worse in multiagent systems (MASs) due to the exponential increase in the state-action space.
  • One major direction of works focused on transferring knowledge across multiagent tasks to accelerate multiagent reinforcement learning (MARL).
  • Wang et al [2020] proposed a dynamic multiagent curriculum learning for large-scale multiagent learning, where three kinds of transfer mechanisms are proposed for transferring knowledge across curricula
  • Objectives:

    The authors aim to control the two ghost players and the pac-man player as the opponent is controlled by well pre-trained PPO policy.
  • Results:

    The authors evaluate the performance of the multioption transfer framework compared with vanilla singleagent DRL algorithms (PPO [Schulman et al, 2017]).
  • Figure 4 presents the layout of the Pac-Man [van der Ouderaa, 2016], which is a competitive maze game with one pacman player and two ghost players.
  • The goal of the pac-man player is to eat as many pills as possible and avoid the pursuit of ghost players.
  • For ghost players, they aim to capture the pac-man player as soon as possible.
  • Each player receives −0.01 penalty each step and +5 reward for catching the pac-man player
  • Conclusion:

    The authors propose a novel multiagent transfer learning framework (MAOPT) for efficient multiagent learning by taking advantage of option-based policy transfer.
  • The authors' framework can be combined with existing DRL approaches.
  • Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art DRL methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.
  • As for future work, it’s worth investigating how to integrate explicit coordination mechanisms, e.g., coordinated exploration and credit assignment into MAOPT to facilitate multiagent coordination
Reference
  • [Agarwal et al., 2019] Akshat Agarwal, Sumit Kumar, and Katia P. Sycara. Learning transferable cooperative behavior in multi-agent teams. CoRR, abs/1906.01202, 2019.
    Findings
  • [Bacon et al., 2017] Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Proceedings of AAAI, pages 1726–1734, 2017.
    Google ScholarLocate open access versionFindings
  • [Boutsioukis et al., 2011] Georgios Boutsioukis, Ioannis Partalas, and Ioannis P. Vlahavas. Transfer learning in multi-agent reinforcement learning domains. In Recent Advances in Reinforcement Learning - 9th European Workshop, pages 249–260, 2011.
    Google ScholarLocate open access versionFindings
  • [Bu et al., 2008] Lucian Bu, Robert Babu, Bart De Schutter, et al. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, 38(2):156–172, 2008.
    Google ScholarLocate open access versionFindings
  • [Claus and Boutilier, 1998] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of AAAI/IAAI, pages 746–752, 1998.
    Google ScholarLocate open access versionFindings
  • [da Silva and Costa, 2019] Felipe Leno da Silva and Anna Helena Reali Costa. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64:645–703, 2019.
    Google ScholarLocate open access versionFindings
  • [Dayan, 1993] Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993.
    Google ScholarLocate open access versionFindings
  • [Didi and Nitschke, 2016] Sabre Didi and Geoff Nitschke. Multi-agent behavior-based policy transfer. In Proceedings of European Conference on the Applications of Evolutionary Computation, pages 181–197, 2016.
    Google ScholarLocate open access versionFindings
  • [Hansen et al., 2004] Eric A Hansen, Daniel S Bernstein, and Shlomo Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of AAAI, volume 4, pages 709–715, 2004.
    Google ScholarLocate open access versionFindings
  • [Hernandez-Leal et al., 2019] Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–797, 2019.
    Google ScholarLocate open access versionFindings
  • [Hu and Wellman, 1998] Junling Hu and Michael P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of ICML, pages 242–250, 1998.
    Google ScholarLocate open access versionFindings
  • [Hu et al., 2015] Yujing Hu, Yang Gao, and Bo An. Accelerating multiagent reinforcement learning by equilibrium transfer. IEEE Trans. Cybernetics, 45(7):1289–1302, 2015.
    Google ScholarLocate open access versionFindings
  • [Kulkarni et al., 2016] Tejas D Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J Gershman. Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016.
    Findings
  • [Lillicrap et al., 2016] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In Proceedings of ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • [Littman, 1994] Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of ICML, pages 157–163, 1994.
    Google ScholarLocate open access versionFindings
  • [Lowe et al., 2017] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of NeurIPS, pages 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • [Mnih et al., 2015] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
    Google ScholarLocate open access versionFindings
  • [Mnih et al., 2016] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of ICML, pages 1928–1937, 2016.
    Google ScholarLocate open access versionFindings
  • [Omidshafiei et al., 2019] Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, and Jonathan P. How. Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of AAAI, pages 6128– 6136, 2019.
    Google ScholarLocate open access versionFindings
  • [Schulman et al., 2017] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • [Sutton and Barto, 1998] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 1998.
    Google ScholarFindings
  • [Sutton et al., 1999] Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181 – 211, 1999.
    Google ScholarLocate open access versionFindings
  • [van der Ouderaa, 2016] Tycho van der Ouderaa. Deep reinforcement learning in pac-man. 2016.
    Google ScholarFindings
  • [Wang et al., 2020] Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, and Yang Gao. From few to more: largescale dynamic multiagent curriculum learning. In Proceedings of AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • [Watkins and Dayan, 1992] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3-4):279– 292, 1992.
    Google ScholarLocate open access versionFindings
  • [Yin and Pan, 2017] Haiyan Yin and Sinno Jialin Pan. Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Proceedings of AAAI, pages 1640–1646, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments