UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles.

arXiv: Learning(2017)

引用 23|浏览107
暂无评分
摘要
We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. First we propose an exploration strategy based on upper-confidence bounds (UCB). Next, we define an u0027u0027InfoGainu0027u0027 exploration bonus, which depends on the disagreement of the $Q$-ensemble. Our experiments show significant gains on the Atari benchmark.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要