Combination of Auction Theory and Multi-Armed Bandits: Model, Algorithm, and Application

IEEE Transactions on Mobile Computing(2023)

引用 6|浏览6
暂无评分
摘要
The multi-armed bandit (MAB) models have always received lots of attention from multiple research communities due to their broad application domains. The optimal selection problem with unknown rewards in advance, such as ad recommendation in social networks, spectrum access in the cognitive radio field, etc., can be efficiently solved by using MAB models. In an MAB model, given $N$ arms whose rewards are unknown in advance, the player selects exactly one arm in each round, and his goal is to maximize the cumulative rewards over a fixed horizon. Further, a more general model called combinatorial MAB (i.e., CMAB), where $K$ arms can be played simultaneously in each round, is put forward. However, the existing CMAB models neglect the strategic behaviors of the $N$ arms, which indicates that one arm might report false information to increase its own profits. In fact, in many applications such as user selection in crowdsensing, the arms are not the feelingless machines but the rational individuals. To this end, we combine the upper confidence bound (UCB) with auction theory to develop a new algorithm called auction-based UCB (AUCB). We divide the auction-based CMAB problem into two sub-problems: winning arm selection and payment computation problems. For AUCB, we derive an upper bound on regret and prove the truthfulness in one round, individual rationality, and computational efficiency. In addition, we consider an extended situation that some arms may be unavailable in some rounds and the arms will bid inconsistently in different rounds. We devise another algorithm called eAUCB to solve this problem. Extensive simulations are conducted to show the significant performance of the proposed algorithms.
更多
查看译文
关键词
Combinatorial multi-armed bandits,auction theory,strategic behaviors,truthfulness,regret bound,applications
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要