Memory Augmented Policy Optimization for Program Synthesis with Generalization
arXiv: Learning, Volume abs/1807.02322, 2018.
This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted...More
Full Text (Upload PDF)