Efficient Exploration through Bayesian Deep Q-Networks

2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA)(2018)

引用 76|浏览81
暂无评分
摘要
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari games in Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, double deep Q network DDQN.
更多
查看译文
关键词
Bayesian deep q-networks,BDQN,practical Thompson sampling,Reinforcement Learning Algorithm,targeted exploration,high dimensions,posterior sampling,Bayesian linear regression model,arcade learning environments,Atari games,BLR,closed-form updates
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要