An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits.

COLT(2017)

引用 87|浏览73
暂无评分
摘要
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $Delta e^{1/Delta^2}$, where $Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要