Fast, scalable algorithms for reinforcement learning in high dimensional domains

Fast, scalable algorithms for reinforcement learning in high dimensional domains(2013)

引用 23|浏览5
暂无评分
摘要
This thesis presents new algorithms for dealing with large scale reinforcement learning problems. Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results.The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games.The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost.My third contribution is the quad-tree factorization (QTF) algorithm, an information- theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchicaldecomposition screen model, so that image patches are only as large as they need to be.Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space—the set of all possible game screens—in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.
更多
查看译文
关键词
value-based reinforcement,future Atari,large observation space,arbitrary Atari,large scale reinforcement,scalable algorithm,linear value function approximation,image patch,contingency awareness,new algorithm,high dimensional domain,prior work
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要