A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
CoRR(2023)
摘要
We investigate the fixed-budget best-arm identification (BAI) problem for
linear bandits in a potentially non-stationary environment. Given a finite arm
set 𝒳⊂ℝ^d, a fixed budget T, and an unpredictable
sequence of parameters {θ_t}_t=1^T, an
algorithm will aim to correctly identify the best arm x^* :=
max_x∈𝒳x^⊤∑_t=1^Tθ_t with probability as
high as possible. Prior work has addressed the stationary setting where
θ_t = θ_1 for all t and demonstrated that the error probability
decreases as exp(-T /ρ^*) for a problem-dependent constant ρ^*. But
in many real-world A/B/n multivariate testing scenarios that motivate our
work, the environment is non-stationary and an algorithm expecting a stationary
setting can easily fail. For robust identification, it is well-known that if
arms are chosen randomly and non-adaptively from a G-optimal design over
𝒳 at each time then the error probability decreases as
exp(-TΔ^2_(1)/d), where Δ_(1) = min_x ≠ x^* (x^* -
x)^⊤1/T∑_t=1^T θ_t. As there exist environments where
Δ_(1)^2/ d ≪ 1/ ρ^*, we are motivated to propose a novel
algorithm 𝖯1-𝖱𝖠𝖦𝖤 that aims to obtain the best of both
worlds: robustness to non-stationarity and fast rates of identification in
benign settings. We characterize the error probability of
𝖯1-𝖱𝖠𝖦𝖤 and demonstrate empirically that the algorithm
indeed never performs worse than G-optimal design but compares favorably to the
best algorithms in the stationary setting.
更多查看译文
关键词
linear bandits,robustness,best-arm,non-stationarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要