Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing
NeurIPS, pp. 10015-10027, 2018.
program synthesisnatural languageweak supervisionsemantic parsingweighted sumMore(1+)
This paper presents MAPO: a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over ...More