Self-Imitation Learning via Generalized Lower Bound Q-learning

NIPS 2020, 2020.

Cited by: 0|Views112
EI
Weibo:
It is of interest to study in general what bias could be beneficial to policy optimization, and how to exploit such bias in practical reinforcement learning algorithms

Abstract:

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the ...More

Code:

Data:

0
Your rating :
0

 

Tags
Comments