Deep Randomized Least Squares Value Iteration

user-5ebe3c75d0b15254d6c50b36(2019)

引用 0|浏览19
暂无评分
摘要
Exploration while learning representations is one of the main challenges Deep Reinforcement Learning (DRL) faces today. As the learned representation is dependant in the observed data, the exploration strategy has a crucial role. The popular DQN algorithm has improved significantly the capabilities of Reinforcement Learning (RL) algorithms to learn state representations from raw data, yet, it uses a naive exploration strategy which is statistically inefficient. The Randomized Least Squares Value Iteration (RLSVI) algorithm (Osband et al., 2016), on the other hand, explores and generalizes efficiently via linearly parameterized value functions. However, it is based on hand-designed state representation that requires prior engineering work for every environment. In this paper, we propose a Deep Learning adaptation for RLSVI. Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a DQN agent. As the representation is being optimized during the learning process, a key component for the suggested method is a likelihood matching mechanism, which adapts to the changing representations. We demonstrate the importance of the various properties of our algorithm on a toy problem and show that our method outperforms DQN in five Atari benchmarks, reaching competitive results with the Rainbow algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要