Learning Deep Inference Machines

J Andrew Bagnell, Alexander Grubb,Daniel M Munoz, Stephane Ross

mag(2013)

引用 23|浏览6
暂无评分
摘要
Introduction. The traditional approach to structured prediction problems is to craft a graphical model structure, learn parameters for the model, and perform inference using an efficient– and usually approximate– inference approach, including, e.g., graph cut methods, belief propagation, and variational methods. Unfortunately, while remarkably powerful methods for inference have been developed and substantial theoretical insight has been achieved especially for simple potentials, the combination of learning and approximate inference for graphical models is still poorly understood and limited in practice. computer vision, for instance, there is a common belief that more sophisticated representations and energy functions are necessary to achieve high performance which are difficult for theoretically sound inference/learning procedures. An alternate view is to consider approximate inference as procedure: we can view an iterative procedure like belief propagation on a random field as a network of computational modules taking observations, other local computations on a graph (messages), and providing intermediate output messages and final output classifications over nodes in the random field. As a concrete example, belief propagation computes marginal distributions over variables by iteratively visiting all nodes in the graph structure and passing messages to neighbors which consist of “cavity marginals”, i.e., a sequence of marginals with the effect of each neighbor removed. To train such an algorithm, we consider training a general classifier using standard supervised learning techniques such that the output of the classifier corresponds to these marginals given the input variables and messages (e.g. by minimizing the logistic loss). In this sense, the classifier is trained to approximate the computations that occur during belief propagation. Note however that in our case, there is no graphical model of the data, i.e. there need not be any probabilistic model that corresponds to the computations performed by the classifier. The inference procedure is instead thought of as a black box function that is trained to yield correct predictions. We note that our approach builds directly on recent work, most notably [1, 2, 3], reducing structured classification to a series of simpler supervised learning problems. Most work here trains relatively short sequences of classifiers making simple, non-iterative decisions. Our approach apes the structure of approximate inference algorithms to benefit from the proven success of iterative methods for decoding the best structured output, but trains the inference network directly to maximize performance. Training Inference Machines. If we consider a variational inference or belief propagation method and instantiate one computational module for each computation of node beliefs, we end up with a tremendously deep network: in our results below, for instance, we have a network with depth O(105). Such networks are (generally) difficult to train using only gradient descent methods. Recent results have demonstrated the power of combining training source local to a module with a global objective function to coordinate behavior. Building Inference. In many iterative inference procedures, there is a natural source of local information which is simply attempting to predict at each node the ideal classification/regression that we hope would result at the end of an inference procedure. For instance, in belief propagation where the computations are over node beliefs, we can simply attempt to target the ideal, single node supervised classification given
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要