Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

ROBOT WORLD CUP XXIV, ROBOCUP 2021(2022)

引用 1|浏览3
暂无评分
摘要
The design and implementation of behaviors for robots operating in dynamic and complex environments are becoming mandatory in nowadays applications. Reinforcement learning is consistently showing remarkable results in learning effective action policies and in achieving super-human performance in various tasks - without exploiting prior knowledge. However, in robotics, the use of purely learning-based techniques is still subject to strong limitations. Foremost, sample efficiency. Such techniques, in fact, are known to require large training datasets, and long training sessions, in order to develop effective action policies. Hence in this paper, to alleviate such constraint, and to allow learning in such robotic scenarios, we introduce SErP (Sample Efficient robot Policies), an iterative algorithm to improve the sample-efficiency of learning algorithms. SErP exploits a sub-optimal planner (here implemented with a monitor-replanning algorithm) to lead the exploration of the learning agent through its initial iterations. Intuitively, SErP exploits the planner as an expert in order to enable focused exploration and to avoid portions of the search space that are not effective to solve the task of the robot. Finally, to confirm our insights and to show the improvements that SErP carries with, we report the results obtained in two different robotic scenarios: (1) a cartpole scenario and (2) a soccer-robots scenario within the RoboCup@Soccer SPL environment.
更多
查看译文
关键词
Automated planning, Reinforcement learning, Decision-making
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要