Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration

IEEE Transactions on Systems, Man, and Cybernetics: Systems（2021）

引用 6|浏览145

暂无评分

摘要

In many real-world scenarios, extrinsic rewards provided by the environment are sparse. An agent trained with classic reinforcement learning algorithm fails to explore these environments in a sufficient and effective way. To address this problem, the exploration bonus which derives from environmental novelty serves as intrinsic motivation for the agent. In recent years, curiosity-driven exploration is a mainstream approach to describe environmental novelty through prediction errors of dynamics models. Due to the expressive ability limitations of curiosity-based environmental novelty and the difficulty of finding appropriate feature space, most curiosity-driven exploration methods have the problem of overprotection against repetition. This problem can reduce the efficiency of exploration and lead the agent into a trap with local optimality. In this article, we propose a combination of persisting curiosity and temporary curiosity framework to deal with the problem of overprotection against repetition. We introduce the self-attention mechanism from the field of computer vision and propose a sequence-based self-attention mechanism for temporary curiosity generation. We compare our framework with some previous exploration methods in hard-exploration environments, provide a series of comprehensive analysis of the proposed framework and investigate the effect of the individual components of our method. The experimental results indicate that the proposed framework delivers superior performance than existing methods.

查看译文

关键词

Attention mechanism,curiosity-driven,exploration,intrinsic reward,reinforcement learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要