Kernel-Based Least Squares Temporal Difference With Gradient Correction

Taek Lyul Song,Dazi Li, Lei Cao,Kotaro Hirasawa

IEEE transactions on neural networks and learning systems（2016）

引用 17|浏览0

暂无评分

摘要

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.

查看译文

关键词

least squares temporal difference,kernel-based

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要