Deep Reinforcement Learning with Risk-Seeking Exploration.

Lecture Notes in Computer Science(2018)

引用 4|浏览52
暂无评分
摘要
In most contemporary work in deep reinforcement learning (DRL), agents are trained in simulated environments. Not only are simulated environments fast and inexpensive, they are also 'safe'. By contrast, training in a real world environment (using robots, for example) is not only slow and costly, but actions can also result in irreversible damage, either to the environment or to the agent (robot) itself. In this paper, we consider taking advantage of the inherent safety in computer simulation by extending the Deep Q-Network (DQN) algorithm with an ability to measure and take risk. In essence, we propose a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training. We demonstrate the merit of the exploration heuristic by (i) arguing that our risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment which are propagated back through Temporal Difference error across many time steps and (ii) evaluating our method on three games in the Atari domain and showing that the technique works well on Montezuma's Revenge, a game that epitomises the challenge of sparse reward.
更多
查看译文
关键词
Deep reinforcement learning,Risk-sensitive,Exploration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要