Empirical analysis of the convergence of Double DQN in relation to reward sparsity.

Samuel Blad,Martin Längkvist,Franziska Klügl,Amy Loutfi

ICMLA（2022）

引用 0|浏览3

暂无评分

摘要

Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.

查看译文

关键词

reinforcement learning,deep q-learning,reward sparsity

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要