A Better Resource Allocation Algorithm with Semi-Bandit Feedback

ALT(2018)

引用 22|浏览47
暂无评分
摘要
We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al. (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of O(log n) improving over the best known bound of O(log^2 n). Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要