No-Regret Reinforcement Learning in Smooth MDPs
CoRR(2024)
摘要
Obtaining no-regret guarantees for reinforcement learning (RL) in the case of
problems with continuous state and/or action spaces is still one of the major
open challenges in the field. Recently, a variety of solutions have been
proposed, but besides very specific settings, the general problem remains
unsolved. In this paper, we introduce a novel structural assumption on the
Markov decision processes (MDPs), namely ν-smoothness, that generalizes
most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To
face this challenging scenario, we propose two algorithms for regret
minimization in ν-smooth MDPs. Both algorithms build upon the idea of
constructing an MDP representation through an orthogonal feature map based on
Legendre polynomials. The first algorithm, Legendre-Eleanor, archives
the no-regret property under weaker assumptions but is computationally
inefficient, whereas the second one, Legendre-LSVI, runs in polynomial
time, although for a smaller class of problems. After analyzing their regret
properties, we compare our results with state-of-the-art ones from RL theory,
showing that our algorithms achieve the best guarantees.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要