Adaptive Regret for Bandits Made Possible: Two Queries Suffice
CoRR(2024)
摘要
Fast changing states or volatile environments pose a significant challenge to
online optimization, which needs to perform rapid adaptation under limited
observation. In this paper, we give query and regret optimal bandit algorithms
under the strict notion of strongly adaptive regret, which measures the maximum
regret over any contiguous interval I. Due to its worst-case nature, there is
an almost-linear Ω(|I|^1-ϵ) regret lower bound, when only one
query per round is allowed [Daniely el al, ICML 2015]. Surprisingly, with just
two queries per round, we give Strongly Adaptive Bandit Learner (StABL) that
achieves Õ(√(n|I|)) adaptive regret for multi-armed bandits with
n arms. The bound is tight and cannot be improved in general. Our algorithm
leverages a multiplicative update scheme of varying stepsizes and a carefully
chosen observation distribution to control the variance. Furthermore, we extend
our results and provide optimal algorithms in the bandit convex optimization
setting. Finally, we empirically demonstrate the superior performance of our
algorithms under volatile environments and for downstream tasks, such as
algorithm selection for hyperparameter optimization.
更多查看译文
关键词
adaptive regret,multi arm bandit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要