Leverage the Average: an Analysis of Regularization in RL

Kozuno Tadashi
Kozuno Tadashi
Scherrer Bruno
Scherrer Bruno
Cited by: 3|Bibtex|Views39|Links

Abstract:

Building upon the formalism of regularized Markov decision processes, we study the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning. Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values. This equivalence ...More

Code:

Data:

Your rating :
0

 

Tags
Comments