Implicit Mixtures of Restricted Boltzmann Machines

NIPS(2008)

引用 117|浏览150
暂无评分
摘要
We present a mixture model whose components are Restricted Boltzmann Ma- chines (RBMs). This possibility has not been considered before because com- puting the partition function of an RBM is intractable, which appears to make learning a mixture of RBMs intractable as well. Surprisingly, when formulated as a third-order Boltzmann machine, such a mixture model can be learned tractably using contrastive divergence. The energy function of the model captures three- way interactions among visible units, hidden units, and a single hidden discrete variable that represents the cluster label. The distinguis hing feature of this model is that, unlike other mixture models, the mixing proportions are not explicitly parameterized. Instead, they are defined implicitly via the energy function and depend on all the parameters in the model. We present results for the MNIST and NORB datasets showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data. The mixture is created by assigning a mixing proportion to each of the component models and it is typically fitted by using the EM algorithm that alternat es between two steps. The E-step uses property 1 to compute the posterior probability that each datapoint came from each of the component models. The posterior is also called the "responsibility" o f each model for a datapoint. The M-step uses property 2 to update the parameters of each model to raise the responsibility-weighted sum of the log probabilities it assigns to the datapoints. The M-st ep also changes the mixing proportions of the component models to match the proportion of the training data that they are responsible for. Restricted Boltzmann Machines (5) model binary data-vectors using binary latent variables. They are considerably more powerful than mixture of multivariate Bernoulli models 1 because they allow many of the latent variables to be on simultaneously so the number of alternative latent state vectors is exponential in the number of latent variables rather than being linear in this number as it is with a mixture of Bernoullis. An RBM withN hidden units can be viewed as a mixture of 2N Bernoulli models, one per latent state vector, with a lot of parameter s haring between the 2N component models and with the 2N mixing proportions being implicitly determined by the same parameters.
更多
查看译文
关键词
em algorithm,posterior probability,component model,boltzmann machine,partition function,latent variable,mixture model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要