A framework of dual replay buffer: Balancing forgetting and generalization in reinforcement learning

Linjing Zhang,Zongzhang Zhang, Zhiyuan Pan,Yingfeng Chen,Jiangcheng Zhu, Zhaorong Wang,Meng Wang,Changjie Fan

Proceedings of the 2nd Workshop on Scaling Up Reinforcement Learning (SURL), International Joint Conference on Artificial Intelligence (IJCAI)(2019)

引用 1|浏览20
暂无评分
摘要
Experience replay buffer improves sample efficiency and training stabilization for recent deep reinforcement learning (DRL) methods. However, for the first-in-first-out (FIFO) retention widely used in plain experience replay buffer, forgetting and generalization are problems in long-time training due to the outflow of some experiences, especially in limited buffer size. With the training progressing and the exploration reducing, experiences generated by the learned policy are narrowed regions of the state space, leading the policy to further fit the current experiences and forget the knowledge obtained from previous experiences. In this paper, we propose a reservoir sampling double replay buffer (RSDRB) framework to alleviate “forgetting” problem, which can be represented by the generalization of the policy. In the RS-DRB framework, experiences are stored into one of the two buffers, ie, the buffers for exploration and exploitation, according to its exploration, then experiences used for training are sampled from the two buffers with different retention policies. We design an adaptive sampling ratio between the two buffers to balance the distribution of the state space. Empirical results show that RS-DRB gains better training and generalization performance than FIFO and some other retention policies.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要