Reward-Based Exploration: Adaptive Control For Deep Reinforcement Learning

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS(2018)

引用 2|浏览31
暂无评分
摘要
Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes "reward-based exploration strategy combined with Softmax action selection" (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.
更多
查看译文
关键词
deep reinforcement learning, reward, exploration, exploitation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要