Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces
CoRR(2024)
摘要
We introduce a novel framework for analyzing reinforcement learning (RL) in
continuous state-action spaces, and use it to prove fast rates of convergence
in both off-line and on-line settings. Our analysis highlights two key
stability properties, relating to how changes in value functions and/or
policies affect the Bellman operator and occupation measures. We argue that
these properties are satisfied in many continuous state-action Markov decision
processes, and demonstrate how they arise naturally when using linear function
approximation methods. Our analysis offers fresh perspectives on the roles of
pessimism and optimism in off-line and on-line RL, and highlights the
connection between off-line RL and transfer learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要