Importance of Representation Learning for Off-Policy Fitted Q-Evaluation

semanticscholar（2021）

引用 0|浏览0

暂无评分

摘要

The goal of ofﬂine policy evaluation (OPE) is to evaluate target policies based on logged data with a possibly much different distribution. One of the most popular empirical approaches to OPE is ﬁtted Q-evaluation (FQE). With linear function approximation, several works have found that FQE (and other OPE methods) exhibit exponential error ampliﬁcation in the problem horizon, except under very strong assumptions. Given the empirical success of deep FQE, in this work we examine the effect of implicit regularization through deep architectures and loss functions on the divergence and performance of FQE. We ﬁnd that divergence does occur with simple feed-forward architectures, but can be mitigated using various architectures and algorithmic techniques, such as ResNet architectures, learning a shared representation between multiple target policies, and hypermodels. Our results suggest interesting directions for future work, including analyzing the effect of architecture on stability of ﬁxed-point updates which are ubiquitous in modern reinforcement learning.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要