Closing the Learning-Planning Loop with Predictive State Representations

Robotics: Science and Systems VI(2010)

引用 0|浏览1
暂无评分
摘要
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate environment model, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. We then evaluate the learner by closing the loop from observations to actions. In more detail, we present a spectral algorithm for learning a predictive state representation (PSR), and evaluate it in a simulated, vision-based mobile robot planning task, showing that the learned PSR captures the essential features of the environment and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons; and, our close-the-loop experiments provide an end-to-end practical test.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要