Model Selection for Generic Contextual Bandits

IEEE TRANSACTIONS ON INFORMATION THEORY(2024)

引用 0|浏览4
暂无评分
摘要
We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit (ACB), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm, that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in ACB, as expected. Furthermore, for the special case of linear contextual bandits, we propose specialized algorithms that obtain sharper guarantees compared to the generic setup.
更多
查看译文
关键词
Model selection,contextual bandits,linear bandits
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要