Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

KNOWLEDGE-BASED SYSTEMS(2024)

引用 0|浏览9
暂无评分
摘要
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates modelfree policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods.
更多
查看译文
关键词
Reinforcement learning,Robotics,Data efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要