MARL-PPS: Multi-agent Reinforcement Learning with Periodic Parameter Sharing
adaptive agents and multi-agents systems(2019)
摘要
We present a multi-agent reinforcement learning algorithm that is a simple, yet effective modification of a known algorithm. External agents are modeled as a time-varying environment, whose policy parameters are updated periodically at a slower rate than the planner to make learning stable and more efficient. Replay buffer, which is used to store the experiences, is also reset with the same large period to draw samples from a fixed environment. This enables us to address challenging cooperative control problems in highway navigation. The resulting Multi-agent Reinforcement Learning with Periodic Parameter Sharing (MARL-PPS) algorithm outperforms the baselines in multi-agent highway scenarios we tested.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络