Online learning with side information.

IEEE Military Communications Conference(2017)

引用 25|浏览28
暂无评分
摘要
An online learning problem with side information is considered. The problem is formulated as a graph structured stochastic Multi-Armed Bandit (MAB). Each node in the graph represents an arm in the bandit problem and an edge between two arms indicates closeness in their mean rewards. It is shown that such side information induces a Unit Interval Graph and several graph properties can be leveraged to achieve a sublinear regret in the number of arms while preserving the optimal logarithmic regret in time. A lower bound on regret is established and a hierarchical learning policy that is order optimal in terms of both the number of arms and the learning horizon is developed.
更多
查看译文
关键词
Online Learning,Multi-Armed Bandits,Side Information,Unit Interval Graphs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要