Non-Stationary Latent Auto-Regressive Bandits

Anna L. Trella,Walter Dempsey, Finale Doshi-Velez,Susan A. Murphy

CoRR（2024）

引用 0|浏览1

暂无评分

摘要

We consider the stochastic multi-armed bandit problem with non-stationary rewards. We present a novel formulation of non-stationarity in the environment where changes in the mean reward of the arms over time are due to some unknown, latent, auto-regressive (AR) state of order k. We call this new environment the latent AR bandit. Different forms of the latent AR bandit appear in many real-world settings, especially in emerging scientific fields such as behavioral health or education where there are few mechanistic models of the environment. If the AR order k is known, we propose an algorithm that achieves Õ(k√(T)) regret in this setting. Empirically, our algorithm outperforms standard UCB across multiple non-stationary environments, even if k is mis-specified.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要