Mixed experience sampling for off-policy reinforcement learning

Expert Systems with Applications(2024)

引用 0|浏览18
暂无评分
摘要
In deep reinforcement learning, experience replay is usually used to improve data efficiency and alleviate experience forgetting. However, online reinforcement learning is often influenced by the index of experience, which usually makes the phenomenon of unbalanced sampling. In addition, most experience replay methods ignore the differences among experiences, and cannot make full use of all experiences. Especially many “near”-policy experiences relatively relevant to the current policy are wasted, despite of the fact that they are beneficial for improving sample efficiency. This paper theoretically analyzes the influence of various factors on experience sampling, and then proposes a sampling method for experience replay based on frequency and similarity (FSER) to alleviate unbalanced sampling and increase the value of the sampled experiences. FSER prefers experiences that are rarely sampled or highly relevant to the current policy. FSER plays a critical role to balance the experience forgetting and wasting problems. Finally, FSER is combined with TD3 to achieve the state-of-the-art results in multiple tasks.
更多
查看译文
关键词
Reinforcement learning,Experience replay,Experience sampling,Off-policy learning,Exploitation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要