GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
CoRR(2023)
摘要
Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning
general-purpose policies from diverse and multi-task offline datasets. Despite
notable recent progress, the predominant offline GCRL methods, mainly
model-free, face constraints in handling limited data and generalizing to
unseen goals. In this work, we propose Goal-conditioned Offline Planning
(GOPlan), a novel model-based framework that contains two key phases: (1)
pretraining a prior policy capable of capturing multi-modal action distribution
within the multi-goal dataset; (2) employing the reanalysis method with
planning to generate imagined trajectories for funetuning policies.
Specifically, we base the prior policy on an advantage-weighted conditioned
generative adversarial network, which facilitates distinct mode separation,
mitigating the pitfalls of out-of-distribution (OOD) actions. For further
policy optimization, the reanalysis method generates high-quality imaginary
data by planning with learned models for both intra-trajectory and
inter-trajectory goals. With thorough experimental evaluations, we demonstrate
that GOPlan achieves state-of-the-art performance on various offline multi-goal
navigation and manipulation tasks. Moreover, our results highlight the superior
ability of GOPlan to handle small data budgets and generalize to OOD goals.
更多查看译文
关键词
offline reinforcement learning,reinforcement learning,goal-conditioned
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要