Look Around! Unexpected gains from training on environments in the vicinity of the target
CoRR(2024)
摘要
Solutions to Markov Decision Processes (MDP) are often very sensitive to
state transition probabilities. As the estimation of these probabilities is
often inaccurate in practice, it is important to understand when and how
Reinforcement Learning (RL) agents generalize when transition probabilities
change. Here we present a new methodology to evaluate such generalization of RL
agents under small shifts in the transition probabilities. Specifically, we
evaluate agents in new environments (MDPs) in the vicinity of the training MDP
created by adding quantifiable, parametric noise into the transition function
of the training MDP. We refer to this process as Noise Injection, and the
resulting environments as δ-environments. This process allows us to
create controlled variations of the same environment with the level of the
noise serving as a metric of distance between environments. Conventional wisdom
suggests that training and testing on the same MDP should yield the best
results. However, we report several cases of the opposite – when targeting a
specific environment, training the agent in an alternative noise setting can
yield superior outcomes. We showcase this phenomenon across 60 different
variations of ATARI games, including PacMan, Pong, and Breakout.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要