Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
arxiv(2023)
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a
promising approach for enabling effective exploration in complex, long-horizon
reinforcement learning (RL) tasks through temporal abstraction. Empirically,
heightened inter-level communication and coordination can induce more stable
and robust policy improvement in hierarchical systems. Yet, most existing
goal-conditioned HRL algorithms have primarily focused on the subgoal
discovery, neglecting inter-level cooperation. Here, we propose a
goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout
(GCMR), aiming to bridge inter-layer information synchronization and
cooperation by exploiting forward dynamics. Firstly, the GCMR mitigates the
state-transition error within off-policy correction via model-based rollout,
thereby enhancing sample efficiency. Secondly, to prevent disruption by the
unseen subgoals and states, lower-level Q-function gradients are constrained
using a gradient penalty with a model-inferred upper bound, leading to a more
stable behavioral policy conducive to effective exploration. Thirdly, we
propose a one-step rollout-based planning, using higher-level critics to guide
the lower-level policy. Specifically, we estimate the value of future states of
the lower-level policy using the higher-level critic function, thereby
transmitting global task information downwards to avoid local pitfalls. These
three critical components in GCMR are expected to facilitate inter-level
cooperation significantly. Experimental results demonstrate that incorporating
the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG,
yields more stable and robust policy improvement compared to various baselines
and significantly outperforms previous state-of-the-art algorithms.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要