Entropy-regularized Point-based Value Iteration
CoRR(2024)
摘要
Model-based planners for partially observable problems must accommodate both
model uncertainty during planning and goal uncertainty during objective
inference. However, model-based planners may be brittle under these types of
uncertainty because they rely on an exact model and tend to commit to a single
optimal behavior. Inspired by results in the model-free setting, we propose an
entropy-regularized model-based planner for partially observable problems.
Entropy regularization promotes policy robustness for planning and objective
inference by encouraging policies to be no more committed to a single action
than necessary. We evaluate the robustness and objective inference performance
of entropy-regularized policies in three problem domains. Our results show that
entropy-regularized policies outperform non-entropy-regularized baselines in
terms of higher expected returns under modeling errors and higher accuracy
during objective inference.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要