Posterior Sampling Model-based Policy Optimization under Approximate Inference

ICLR 2023(2023)

引用 0|浏览52
暂无评分
摘要
Model-based reinforcement learning algorithms (MBRL) hold tremendous promise for improving the sample efficiency in online RL. However, many existing popular MBRL algorithms cannot deal with exploration and exploitation properly. Posterior sampling reinforcement learning (PSRL) serves as a promising approach for automatically trading off the exploration and exploitation, but the theoretical guarantees only hold under exact inference. In this paper, we show that adopting the same methodology as in exact PSRL can be fairly suboptimal under approximate inference. Motivated by the analysis, we propose an improved factorization for the posterior distribution of polices by removing the conditional independence between the policy and data given the model. By adopting such a posterior factorization, we further propose a general algorithmic framework for PSRL under approximate inference and a practical instantiation of it. Empirically, our algorithm can surpass the baseline methods by a significant margin on both dense rewards and sparse rewards tasks from DM control suite, OpenAI Gym and Metaworld benchmarks.
更多
查看译文
关键词
Reinforcement learning,Posterior,Model-based reinforcement learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要