Speedy q-learning: a computationally efficient reinforcement learning algorithm with a near optimal rate of convergence

Journal of Machine Learning Research(2013)

引用 23|浏览38
暂无评分
摘要
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processes (MDP) under the probably approximately correct (PAC) model. We introduce a new variant of Q-learning, called speedy Q-learning (SQL), to address the problem of the slow convergence in the standard Q-learning algorithm, and prove PAC bounds on the performance of this algorithm. The bounds indicate that for any MDP with n state-action pairs and discount factor γ ∈ [0, 1), a total number of O ( n log(n)/((1 − γ)ǫ) ) steps suffices for SQL to converge to an ǫ-optimal action-value function with high probability. We also derive a lower-bound of Ω ( n/((1−γ)2ǫ2) ) for all RL algorithms, which matches the upper bound in terms of n (up to a logarithmic factor) and ǫ. Moreover, our results have better dependencies on ǫ and 1−γ (the same dependency on n), and thus, are tighter than the best available results for Q-learning. The SQL algorithm also improves on existing results for the batch Q-value iteration, in terms of the computational budget required to achieve a near optimal solution.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要