Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Ahmadreza Moradipari,Mohammad Pedramfar,Modjtaba Shokrian Zini,Vaneet Aggarwal

NeurIPS（2023）

引用 0|浏览15

暂无评分

摘要

In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order O(H√(d_l_1T)) in the time inhomogeneous reinforcement learning problem where H is the episode length and d_l_1 is the Kolmogorov l_1-dimension of the space of environments. We then find concrete bounds of d_l_1 in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要