Self-Imitation Learning via Generalized Lower Bound Q-learning
NIPS 2020, 2020.
It is of interest to study in general what bias could be beneficial to policy optimization, and how to exploit such bias in practical reinforcement learning algorithms
Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the ...More
PPT (Upload PPT)