Regret Minimization in Partially Observable Linear Quadratic Control

Lale Sahin,Azizzadenesheli Kamyar,Hassibi Babak,Anandkumar Anima

arxiv（2020）

引用 3|浏览67

暂无评分

摘要

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$ for ExpCommit, where $T$ is the time horizon of the problem.

查看译文

关键词

regret,control

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要