Solving Continual Offline Reinforcement Learning with Decision Transformer
CoRR(2024)
摘要
Continuous offline reinforcement learning (CORL) combines continuous and
offline reinforcement learning, enabling agents to learn multiple tasks from
static datasets without forgetting prior tasks. However, CORL faces challenges
in balancing stability and plasticity. Existing methods, employing Actor-Critic
structures and experience replay (ER), suffer from distribution shifts, low
efficiency, and weak knowledge-sharing. We aim to investigate whether Decision
Transformer (DT), another offline RL paradigm, can serve as a more suitable
offline continuous learner to address these issues. We first compare AC-based
offline algorithms with DT in the CORL framework. DT offers advantages in
learning efficiency, distribution shift mitigation, and zero-shot
generalization but exacerbates the forgetting problem during supervised
parameter updates. We introduce multi-head DT (MH-DT) and low-rank adaptation
DT (LoRA-DT) to mitigate DT's forgetting problem. MH-DT stores task-specific
knowledge using multiple heads, facilitating knowledge sharing with common
components. It employs distillation and selective rehearsal to enhance current
task learning when a replay buffer is available. In buffer-unavailable
scenarios, LoRA-DT merges less influential weights and fine-tunes DT's decisive
MLP layer to adapt to the current task. Extensive experiments on MoJuCo and
Meta-World benchmarks demonstrate that our methods outperform SOTA CORL
baselines and showcase enhanced learning capabilities and superior memory
efficiency.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要