Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Will Grathwohl,Dami Choi,Yuhuai Wu,Geoffrey Roeder,David Duvenaud

International Conference on Learning Representations（2018）

引用 298|浏览64

暂无评分

摘要

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

查看译文

关键词

estimation,control variates,black-box

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要