Spectral MAB for Unknown Graph Processes.

European Signal Processing Conference(2018)

引用 1|浏览20
暂无评分
摘要
In this work, we study graph-based multi-arms bandit (MAB) problems aimed at optimizing actions on irregular and high-dimensional graphs. More formally, we consider a decision-maker that takes sequential actions over time and observes the experienced reward, defined as a function of a sparse graph signal. The goal is to optimize the action policy, which maximizes the reward experienced over time. The main challenges are represented by the system uncertainty (i.e., unknown parameters of the sparse graph signal model) and the high-dimensional search space. The uncertainty can be faced by online learning strategies that infer the system dynamics while taking the appropriate actions. However, the high-dimensionality makes online learning strategies highly inefficient. To overcome this limitation, we propose a novel graph-based MAB algorithm, which is data-efficient also in high-dimensional systems. The key intuition is to infer the nature of the graph processes by learning in the graph-spectral domain, and exploit this knowledge while optimizing the actions. In particular, we model the graph signal with a sparse dictionary-based representation and we propose an online sequential decision strategy that learns the parameters of the graph processes while optimizing the action strategy.
更多
查看译文
关键词
decision making,graph-based MAB algorithm,high-dimensional search space,sparse graph signal model,action policy,high-dimensional graphs,multiarms bandit problems,unknown graph processes,spectral MAB,action strategy,online sequential decision strategy,sparse dictionary-based representation,graph-spectral domain,online learning strategies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要