On the Expected Dynamics of Nonlinear TD Learning.

arXiv: Learning(2019)

引用 23|浏览22
暂无评分
摘要
While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected dynamics of the TD(0) algorithm. We prove that this ODE is attracted to a compact set for smooth homogeneous functions including some ReLU networks. For over-parametrized and well-conditioned functions in sufficiently reversible environments we prove convergence to the global optimum. This result improves when using $k$-step or $ \lambda$ returns. Finally, we generalize a divergent counterexample to a family of divergent problems to motivate the assumptions needed to prove convergence.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要