Combinatorial Gaussian Process Bandits With Probabilistically Triggered Arms

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS)(2021)

引用 0|浏览8
暂无评分
摘要
Combinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive (O(root mTlogT gamma(PTA)(T,mu))) O (m root T log T/p*) upper bounds for the regret of ComGP-UCB that hold with high probability, where m denotes the number of base arms, p* denotes the minimum non-zero triggering probability, and gamma(PTA)(T,mu) denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要