An Optimal Elimination Algorithm for Learning a Best Arm

NIPS 2020(2020)

引用 11|浏览112
暂无评分
摘要
We consider the classic problem of (ϵ,δ)-PAC learning a best arm where the goal is to identify with confidence 1-δ an arm whose mean is an ϵ-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for (ϵ,δ)-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to exactly the optimal sample complexity of (ϵ,δ)-learning the mean of n arms separately and we complement this result with a conditional matching lower bound. More specifically:
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要