AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Discriminative training allows for a wider variety of Sum-product networks architectures than generative training, because completeness and consistency do not have to be maintained over evidence variables

Discriminative Learning of Sum-Product Networks.

NIPS, pp.3248-3256, (2012)

Cited by: 238|Views260
EI
Full Text
Bibtex
Weibo

Abstract

Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power a...More

Code:

Data:

0
Introduction
  • Probabilistic models play a crucial role in many scientific disciplines and real world applications.
  • SPNs are a deep architecture with full probabilistic semantics where inference is guaranteed to be tractable, under general conditions derived by Poon and Domingos [23].
  • Despite their tractability, SPNs are quite expressive [16], and have been used to solve difficult problems in vision [23, 1]
Highlights
  • Probabilistic models play a crucial role in many scientific disciplines and real world applications
  • We observe that Sum-product networks (SPNs) can achieve higher performance using half as many features as the best approach, the learned pooling function. We hypothesize that this is because the SPN architecture allows us to discriminatively train large moveable parts, image structure that cannot be captured by larger dictionaries
  • Sum-product networks are a new class of probabilistic model where inference remains tractable despite high treewidth and many hidden layers
  • This paper introduced the first algorithms for learning SPNs discriminatively, using a form of backpropagation to compute gradients
  • Discriminative training allows for a wider variety of SPN architectures than generative training, because completeness and consistency do not have to be maintained over evidence variables
  • Experiments on image classification benchmarks illustrate the power of discriminative SPNs
Methods
  • Logistic Regression [24] SVM [5] SIFT [5] mcRBM [24] mcRBM-DBN [24].
  • Convolutional RBM [10].
  • K-means (Triangle) [10] 4000, 4x4 grid 79.6 % HKDES [4].
  • 3-Layer Learned RF [12] 1600, 9x9 grid 82.0%.
  • Learned Pooling [20]
Results
  • Results on

    CIFAR-10

    CIFAR-10 consists of 32x32 pixel images: 5 × 104 for training and 104 for testing.
  • The authors observe that SPNs can achieve higher performance using half as many features as the best approach, the learned pooling function
  • The authors hypothesize that this is because the SPN architecture allows them to discriminatively train large moveable parts, image structure that cannot be captured by larger dictionaries.
  • With K=1600, G=8, W =4, P =10, and T =3 the authors achieved 62.3% (± 1.0% standard deviation among folds), the highest published test accuracy as of writing
  • This includes approaches that make use of the unlabeled training images.
  • Just as with the features of Coates et al [10], the authors anticipate that using an SPN instead of the SVM would be beneficial by learning spatial structure that the SVM cannot model
Conclusion
  • Sum-product networks are a new class of probabilistic model where inference remains tractable despite high treewidth and many hidden layers.
  • Discriminative training allows for a wider variety of SPN architectures than generative training, because completeness and consistency do not have to be maintained over evidence variables.
  • The authors proposed both “soft” and “hard” gradient algorithms, using marginal inference in the “soft” case and MPE inference in the “hard” case.
  • Experiments on image classification benchmarks illustrate the power of discriminative SPNs
Tables
  • Table1: Inference procedures
  • Table2: Weight updates
  • Table3: Test accuracies on CIFAR-10
  • Table4: Comparison of average test accuracies on all folds of STL-10
Download tables as Excel
Funding
  • This research was partly funded by ARO grant W911NF-08-1-0242, AFRL contract FA8750-09-C-0181, NSF grant IIS-0803481, and ONR grant N00014-12-1-0312
Reference
  • M. Amer and S. Todorovic. Sum-product networks for modeling activities with stochastic structure. CVPR, 2012.
    Google ScholarLocate open access versionFindings
  • F. Bach and M.I. Jordan. Thin junction trees. Advances in Neural Information Processing Systems, 14:569–576, 2002.
    Google ScholarLocate open access versionFindings
  • Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
    Google ScholarLocate open access versionFindings
  • L. Bo, K. Lai, X. Ren, and D. Fox. Object recognition with hierarchical kernel descriptors. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1729–1736. IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • L. Bo, X. Ren, and D. Fox. Kernel descriptors for visual recognition. Advances in Neural Information Processing Systems, 2010.
    Google ScholarLocate open access versionFindings
  • L. Bo, X. Ren, and D. Fox. Unsupervised feature learning for RGB-D based object recognition. ISER, 2012.
    Google ScholarFindings
  • C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-specific independence in bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 115– 123, 1996.
    Google ScholarLocate open access versionFindings
  • M. Chavira and A. Darwiche. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6-7):772–799, 2008.
    Google ScholarLocate open access versionFindings
  • A. Chechetka and C. Guestrin. Efficient principled learning of thin junction trees. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.
    Google ScholarLocate open access versionFindings
  • A. Coates, H. Lee, and A.Y. Ng. An analysis of single-layer networks in unsupervised feature learning. In aistats11. Society for Artificial Intelligence and Statistics, 2011.
    Google ScholarLocate open access versionFindings
  • A. Coates and A.Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In International Conference on Machine Learning, volume 8, page 10, 2011.
    Google ScholarLocate open access versionFindings
  • A. Coates and A.Y. Ng. Selecting receptive fields in deep networks. NIPS, 2011.
    Google ScholarLocate open access versionFindings
  • M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pages 1–8, Philadelphia, PA, 2002. ACL.
    Google ScholarLocate open access versionFindings
  • A. Darwiche. A differential approach to inference in Bayesian networks. Journal of the ACM, 50:280– 305, 2003.
    Google ScholarLocate open access versionFindings
  • A. Darwiche. Modeling and Reasoning with Bayesian Networks. Cambridge University Press, 2009.
    Google ScholarFindings
  • O. Delalleau and Y. Bengio. Shallow vs. deep sum-product networks. In Proceedings of the 25th Conference on Neural Information Processing Systems, 2011.
    Google ScholarLocate open access versionFindings
  • A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.
    Google ScholarLocate open access versionFindings
  • P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8.
    Google ScholarLocate open access versionFindings
  • A. Hyvarinen and E. Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
    Google ScholarLocate open access versionFindings
  • Y. Jia, C. Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, 2012.
    Google ScholarLocate open access versionFindings
  • A. Kulesza, F. Pereira, et al. Structured learning with approximate inference. Advances in Neural Information Processing Systems, 20:785–792, 2007.
    Google ScholarLocate open access versionFindings
  • J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289, Williamstown, MA, 2001. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • H. Poon and P. Domingos. Sum-product networks: A new deep architecture. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence, pages 337–346, 2011.
    Google ScholarLocate open access versionFindings
  • M.A. Ranzato and G.E. Hinton. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2551–2558. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • J. Salojarvi, K. Puolamaki, and S. Kaski. Expectation maximization algorithms for conditional likelihoods. In Proceedings of the 22nd international conference on Machine learning, pages 752–759. ACM, 2005.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科