Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning

AAAI 2024(2024)

引用 0|浏览4
暂无评分
摘要
Intrinsic motivation lies at the heart of the exploration of reinforcement learning, which is primarily driven by the agent's inherent satisfaction rather than external feedback from the environment. However, in recent more challenging procedurally-generated environments with high stochasticity and uninformative extrinsic rewards, we identify two significant issues of applying intrinsic motivation. (1) State representation collapse: In existing methods, the learned representations within intrinsic motivation have a high probability to neglect the distinction among different states and be distracted by the task-irrelevant information brought by the stochasticity. (2) Insufficient interrelation among dynamics: Unsuccessful guidance provided by the uninformative extrinsic reward makes the dynamics learning in intrinsic motivation less effective. In light of the above observations, a novel Behavioral metric with Cyclic Dynamics (BCD) is proposed, which considers both cumulative and immediate effects and facilitates the abstraction and exploration of the agent. For the behavioral metric, the successor feature is utilized to reveal the expected future rewards and alleviate the heavy reliance of previous methods on extrinsic rewards. Moreover, the latent variable and vector quantization techniques are employed to enable an accurate measurement of the transition function in a discrete and interpretable manner. In addition, cyclic dynamics is established to capture the interrelations between state and action, thereby providing a thorough awareness of environmental dynamics. Extensive experiments conducted on procedurally-generated environments demonstrate the state-of-the-art performance of our proposed BCD.
更多
查看译文
关键词
ML: Reinforcement Learning,ROB: Behavior Learning & Control,ROB: Localization, Mapping, and Navigation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要