Collaborative Exploration and Exploitation in massively Multi-Player Bandits

HAL (Le Centre pour la Communication Scientifique Directe)(2021)

引用 0|浏览0
暂无评分
摘要
In this paper, we propose an approach to optimize the performance of Internet of Things (IoT) networks. We formulate the optimization problem as a massive multi-player multi-armed bandit problem, where the devices are the players and the radio channels are the arms, with collisions possibly preventing message reception. For handling a realistic IoT network, we do not assume that sensing information is available (i.e. that the collision are observed) or that the number of players is smaller than the number of arms. As the optimization problem is intractable, we propose two greedy policies: the first one focusing on the number of successful communications, while the second one also takes into account fairness between players. In order to implement an approximation of the targeted policies, we propose an explore-then-exploit approach, and establish a regret lower bound in Ω T 2/3 log T N + K 3/2. For estimating the mean reward of arms, we propose a decentralized exploration algorithm with controlled information exchanges between players. Then we state that the regret of the estimated target policy is optimal with respect to the time horizon T. Finally, we provide some experimental evidences that the proposed algorithms outperform several baselines.
更多
查看译文
关键词
exploration,exploitation,multi-player
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要