Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2018)
摘要
The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face thetask of learning how to choose a route that minimizes their travel times.
更多查看译文
关键词
multiarmed bandit algorithms,multiagent action selection,case study,route choice,multiarmed bandit problem,slot machine,reinforcement learning,nonstationary variants,MAB algorithms,Q-learning,nonstationary repeated game,commuter agents
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络