$Q$-learning with Logarithmic Regret

Cited by: 0|Views122

Abstract:

This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function. We prove that the optimistic $Q$-learning studied in [Jin et al. 2018] enjoys ...More

Code:

Data:

Get fulltext within 24h
Bibtex
Your rating :
0

 

Tags
Comments