On the Importance of the Policy Structure in Offline Reinforcement Learning

Takayuki Osa,Akinobu Hayashi, Pranav Deo, Naoki Morihira,Takahide Yoshiike

ICLR 2023(2023)

引用 0|浏览8
暂无评分
摘要
Offline reinforcement learning (RL) has attracted a great deal of attention recently as an approach to utilizing past experience to learn a policy. Recent studies have reported the challenges of offline RL, such as estimating the values of actions that are out of the data distribution. To mitigate the issues of offline RL, we propose an algorithm that leverages a mixture of deterministic policies. With our framework, the state-action space is divided by learning discrete latent variables, and sub-policies corresponding to each region are trained. The proposed algorithm, which we call Value-Weighted Variational Auto-Encoder (V2AE), is derived by considering the variational lower bound of the offline RL objective function. The aim of this work is to shed lights on the importance on the policy structure in offline RL. We show empirically that the use of the proposed mixture policy can reduce the accumulation of the approximation error in offline RL, which was reported in previous studies. Experimental results also indicate that introducing the policy structure improves the performance on tasks with D4RL benchmarking datasets.
更多
查看译文
关键词
offline reinforcement learning,discrete latent representations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要