Scaling Data-Driven Robotics With Reward Sketching And Batch Reinforcement Learning
ROBOTICS: SCIENCE AND SYSTEMS XVI(2020)
摘要
By harnessing a growing dataset of robot experience, we learn control policies for a diverse and increasing set of related manipulation tasks. To make this possible, we introduce reward sketching an effective way of eliciting human preferences to learn the reward function for a new task. This reward function is then used to retrospectively annotate all historical data, collected for different tasks, with predicted rewards for the new task. The resulting massive annotated dataset can then be used to learn manipulation policies with batch reinforcement learning (RL) from visual input in a completely off-line way, i.e., without interactions with the real robot. This approach makes it possible to scale up RL in robotics, as we no longer need to run the robot for each step of learning We show that the trained hatch RI: agents, when deployed in real robots, can perform a variety of challenging tasks involving multiple interactions among rigid or deformable objects. Moreover, they display a significant degree of robustness and generalization. In some cases, they even outperform human teleoperators.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络