Mortal Multi-Armed Bandits

Deepayan Chakrabarti,Ravi Kumar,Filip Radlinski,Eli Upfal

NIPS（2008）

引用 196|浏览56

暂无评分

摘要

We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standardk-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with near- certainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budgets. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the typical ad lifetime. We present an optimal algorithm for the state-aware (deterministic reward func- tion) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distribu- tions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings.

查看译文

关键词

e commerce,multi armed bandit,online advertising,empirical study

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要