Meta Learning in Bandits within Shared Affine Subspaces
arxiv(2024)
摘要
We study the problem of meta-learning several contextual stochastic bandits
tasks by leveraging their concentration around a low-dimensional affine
subspace, which we learn via online principal component analysis to reduce the
expected regret over the encountered bandits. We propose and theoretically
analyze two strategies that solve the problem: One based on the principle of
optimism in the face of uncertainty and the other via Thompson sampling. Our
framework is generic and includes previously proposed approaches as special
cases. Besides, the empirical results show that our methods significantly
reduce the regret on several bandit tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要