A Simple Approach For Non-Stationary Linear Bandits

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108(2020)

引用 78|浏览225
暂无评分
摘要
This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Previous studies have adopted sophisticated mechanisms, such as sliding window or weighted penalty to achieve near-optimal dynamic regret. In this paper, we demonstrate that a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Let T be the time horizon, d be the dimension, and P-T be the path-length that measures the fluctuation of the evolving unknown parameter, our approach enjoys an (O) over tilde (d(2/3)(1 + P-T)T-1/3(2/3)) dynamic regret, which is nearly optimal, matching the Omega(d(2/3)(1+P-T)T-1/3(2/3)) minimax lower bound up to logarithmic factors. Empirical studies also validate the efficacy of our approach.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要