Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization
CoRR(2023)
摘要
Restless multi-arm bandits (RMABs), a class of resource allocation problems
with broad application in areas such as healthcare, online advertising, and
anti-poaching, have recently been studied from a multi-agent reinforcement
learning perspective. Prior RMAB research suffers from several limitations,
e.g., it fails to adequately address continuous states, and requires retraining
from scratch when arms opt-in and opt-out over time, a common challenge in many
real world applications. We address these limitations by developing a neural
network-based pre-trained model (PreFeRMAB) that has general zero-shot ability
on a wide range of previously unseen RMABs, and which can be fine-tuned on
specific instances in a more sample-efficient way than retraining from scratch.
Our model also accommodates general multi-action settings and discrete or
continuous state spaces. To enable fast generalization, we learn a novel single
policy network model that utilizes feature information and employs a training
procedure in which arms opt-in and out over time. We derive a new update rule
for a crucial λ-network with theoretical convergence guarantees and
empirically demonstrate the advantages of our approach on several challenging,
real-world inspired problems.
更多查看译文
关键词
zero shot learning,shot learning,multi-armed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要