A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning.

Knowledge-Based Systems(2019)

引用 31|浏览134
暂无评分
摘要
Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training.
更多
查看译文
关键词
Deep reinforcement learning,Robotics,Multi-step methods,Data efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要