Behavior Constraining in Weight Space for Offline Reinforcement Learning.

Phillip Swazinna,Steffen Udluft,Daniel Hein,Thomas Runkler

The European Symposium on Artificial Neural Networks (ESANN)（2021）

引用 2|浏览8

暂无评分

摘要

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

查看译文

关键词

reinforcement learning,weight space,behavior

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要