Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs
CoRR(2024)
摘要
Consider the domain of multiclass classification within the adversarial
online setting. What is the price of relying on bandit feedback as opposed to
full information? To what extent can an adaptive adversary amplify the loss
compared to an oblivious one? To what extent can a randomized learner reduce
the loss compared to a deterministic one? We study these questions in the
mistake bound model and provide nearly tight answers.
We demonstrate that the optimal mistake bound under bandit feedback is at
most O(k) times higher than the optimal mistake bound in the full information
case, where k represents the number of labels. This bound is tight and
provides an answer to an open question previously posed and studied by Daniely
and Helbertal ['13] and by Long ['17, '20], who focused on deterministic
learners.
Moreover, we present nearly optimal bounds of Θ̃(k) on the gap
between randomized and deterministic learners, as well as between adaptive and
oblivious adversaries in the bandit feedback setting. This stands in contrast
to the full information scenario, where adaptive and oblivious adversaries are
equivalent, and the gap in mistake bounds between randomized and deterministic
learners is a constant multiplicative factor of 2.
In addition, our results imply that in some cases the optimal randomized
mistake bound is approximately the square-root of its deterministic parallel.
Previous results show that this is essentially the smallest it can get.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要