An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?

COLT(2009)

引用 29|浏览13
暂无评分
摘要
Consider a sequence of examples (xt, yt) for t = 1, . . . , T where xt ∈ R and yt ∈ [K], where the goal of a Learner is to predict the class yt from the input xt. In the more common full-information setting, the Learner observes the true class yt after making her prediction ŷt. In the present open problem, however, we will consider the so-called bandit setting: after predicting ŷt, the Learner is only told “correct” or “incorrect”, her feedback being a single bit 1[ŷt 6= yt]. We assume that the Learner’s hypothesis class is the set of K-tuples of vectors W = 〈w1, . . . ,wK〉 where wi ∈ R (we can think of W as the K × n hypothesis matrix). Given an instance xt, such a hypothesis will produce a K-tuple of “scores” 〈w1 ·xt, . . . ,wK ·xt〉, and the Learner’s prediction will be the class with the largest score: ŷt = arg max k∈[K] wk · xt.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要