Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits.

J. Mach. Learn. Res.(2022)

引用 0|浏览8
暂无评分
摘要
We introduce GLR-klUCB, a novel algorithm for the piecewise i.i.d. non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, klUCB, with an efficient, parameter-free, change-point detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous nonstationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a TATT ln(T)) regret in T rounds on some "easy" instances in which there is sufficient delay between two change-points, where A is the number of arms and TT the number of change-points, without prior knowledge of TT. In contrast with recently proposed algorithms that are agnostic to TT, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.
更多
查看译文
关键词
Multi-Armed Bandits, Change Point Detection, Non-Stationary Bandits
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要