Distributed Reinforcement Learning with Self-Play in Parameterized Action Space

SMC(2021)

引用 1|浏览8
暂无评分
摘要
Self-play has been shown to be effective to provide a proper training curriculum for a reinforcement learning agent in competitive multi-agent environments without direct supervision. However, its performance is still unstable for problems with sparse rewards, e.g., the scoring task with goalkeeper for robots in RoboCup soccer. It is challenging to solve these tasks in reinforcement learning, especially for those that require combining high-level actions with flexible control. To address these challenges, we introduce a distributed self-play training framework for an extended proximal policy optimization (PPO) algorithm that learns to act in parameterized action space and plays against a group of opponents, i.e., a league. Experiments on the domain of simulated RoboCup soccer show that, the approach is effective and learns more robust policies against various opponents compared to existing reinforcement learning methods. A demonstration video is available online at https://youtu.be/BuLli1vND4.
更多
查看译文
关键词
reinforcement learning,parameterized action space,distributed,self-play,multi-agents
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要