Momentum Boosted Episodic Memory for Improving Learning in Long-Tailed RL Environments

Dolton Milagres Fernandes,Pramod Kaushik,Harsh Shukla,Bapi Raju Surampudi

ICLR 2023（2023）

引用 0|浏览5

暂无评分

摘要

Conventional Reinforcement Learning (RL) algorithms assume the distribution of the data to be uniform or mostly uniform. However, this is not the case with most real-world applications like autonomous driving or in nature, where animals roam. Some objects are encountered frequently, and most of the remaining experiences occur rarely; the resulting distribution is called \emph{Zipfian}. Taking inspiration from the theory of \emph{complementary learning systems}, an architecture for learning from Zipfian distributions is proposed where long tail states are discovered in an unsupervised manner and states along with their recurrent activation are kept longer in episodic memory. The recurrent activations are then reinstated from episodic memory using a similarity search, giving weighted importance. The proposed architecture yields improved performance in a Zipfian task over conventional architectures. Our method outperforms IMPALA by a significant margin of 20.3\% when maps/objects occur with a uniform distribution and by 50.2\% on the rarest 20\% of the distribution.

查看译文

关键词

long tail distribution,reinforcement learning,representation learning,contrastive learning,complementary learning system,hippocampus

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要