Single-Reset Divide Conquer Imitation Learning
CoRR(2024)
摘要
Demonstrations are commonly used to speed up the learning process of Deep
Reinforcement Learning algorithms. To cope with the difficulty of accessing
multiple demonstrations, some algorithms have been developed to learn from a
single demonstration. In particular, the Divide Conquer Imitation Learning
algorithms leverage a sequential bias to learn a control policy for complex
robotic tasks using a single state-based demonstration. The latest version,
DCIL-II demonstrates remarkable sample efficiency. This novel method operates
within an extended Goal-Conditioned Reinforcement Learning framework, ensuring
compatibility between intermediate and subsequent goals extracted from the
demonstration. However, a fundamental limitation arises from the assumption
that the system can be reset to specific states along the demonstrated
trajectory, confining the application to simulated systems. In response, we
introduce an extension called Single-Reset DCIL (SR-DCIL), designed to overcome
this constraint by relying on a single initial state reset rather than
sequential resets. To address this more challenging setting, we integrate two
mechanisms inspired by the Learning from Demonstrations literature, including a
Demo-Buffer and Value Cloning, to guide the agent toward compatible success
states. In addition, we introduce Approximate Goal Switching to facilitate
training to reach goals distant from the reset state. Our paper makes several
contributions, highlighting the importance of the reset assumption in DCIL-II,
presenting the mechanisms of SR-DCIL variants and evaluating their performance
in challenging robotic tasks compared to DCIL-II. In summary, this work offers
insights into the significance of reset assumptions in the framework of DCIL
and proposes SR-DCIL, a first step toward a versatile algorithm capable of
learning control policies under a weaker reset assumption.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要