Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret.

arXiv: Learning(2019)

引用 156|浏览30
暂无评分
摘要
We present the first computationally-efficient algorithm with $widetilde O(sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesvu0027ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要