Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

CoRR(2024)

引用 0|浏览2
暂无评分
摘要
We study the generalized linear contextual bandit problem within the requirements of limited adaptivity. In this paper, we present two algorithms, B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited adaptivity models: batch learning with stochastic contexts and rare policy switches with adversarial contexts. For both these models, we establish essentially tight regret bounds. Notably, in the obtained bounds, we manage to eliminate a dependence on a key parameter κ, which captures the non-linearity of the underlying reward model. For our batch learning algorithm B-GLinCB, with Ω( loglog T) batches, the regret scales as Õ(√(T)). Further, we establish that our rarely switching algorithm RS-GLinCB updates its policy at most Õ(log^2 T) times and achieves a regret of Õ(√(T)). Our approach for removing the dependence on κ for generalized linear contextual bandits might be of independent interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要