Ensemble sampling for linear bandits: small ensembles suffice.
CoRR(2023)
摘要
We provide the first useful, rigorous analysis of ensemble sampling for the
stochastic linear bandit setting. In particular, we show that, under standard
assumptions, for a $d$-dimensional stochastic linear bandit with an interaction
horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d
\log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the
first result in any structured setting not to require the size of the ensemble
to scale linearly with $T$ -- which defeats the purpose of ensemble sampling --
while obtaining near $\sqrt{T}$ order regret. Ours is also the first result
that allows infinite action sets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要