Stable and Efficient Policy Evaluation.
IEEE Transactions on Neural Networks and Learning Systems(2019)
摘要
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not of...
更多查看译文
关键词
Stability criteria,Approximation algorithms,Prediction algorithms,Training,Learning systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要