Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy
arXiv: Learning, 2019.
EI
Abstract:
Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradi...More
Code:
Data:
Tags
Comments