Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

arXiv: Learning, 2019.

Cited by: 0|Views53
EI

Abstract:

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradi...More

Code:

Data:

Your rating :
0

 

Tags
Comments