Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming1

msra(1996)

引用 151|浏览50
暂无评分
摘要
We introduce a new policy iteration method for dynamic programming problems with dis- counted and undiscounted cost. The method is based on the notion of temporal differences, and is primarily geared to the case of large and complex problems where the use of approximations is essential. We develop the theory of the method without approximation, we describe how to em- be di twithin a neuro-dynamic programming/reinforcement learning context where feature-based approximation architectures are used, we relate it to TD(λ) methods, and we illustrate its use in the training of a tetris playing program.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要