A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.

IEEE transactions on neural networks and learning systems(2022)

引用 2|浏览9
暂无评分
摘要
Upper confidence bound (UCB)-based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrap technique. Our proposed estimator mitigates the problem of specifying a heavier tail property by adaptively converging to the ground truth contextual bandit UCB (i.e., eliminating the impact of the specified heavier tail property) with theoretical guarantees on the convergence. The design and convergence analysis of the proposed estimator is technically nontrivial. The proposed estimator is generic and it can be applied to improve a variety of UCB-based contextual bandit algorithms. To demonstrate the versatility of the proposed estimator, we apply it to improve the linear reward contextual bandit UCB (LinUCB) algorithm resulting in our bootstrapping LinUCB (BootLinUCB) algorithm. We prove that the BootLinUCB has a sublinear regret. We conduct extensive experiments on both synthetic dataset and real-world dataset from Yahoo! to validate the benefits of our proposed estimator in reducing regret and the superior performance of BootLinUCB over the latest baseline.
更多
查看译文
关键词
Tail,Context modeling,Upper bound,Convergence,Program processors,Learning systems,Tuning,Contextual bandits,multiplier bootstrap,second-order correction,upper confidence bound (UCB)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要