Deep compression of probabilistic graphical networks

Pattern Recognition, pp. 1069792019.

Cited by: 3|Bibtex|Views24|Links
EI
Keywords:
deep beliefDeep Belief NetworksVariational Auto-encodersDeep compressionmachine learningMore(17+)
Weibo:
Inspired by the parameter redundancy in deep deterministic neural networks, such as convolutional neural networks, the same problem in deep Probabilistic Graphical Networks confirmed in this paper

Abstract:

Probabilistic Graphical Models (PGMs) are important and active research areas in machine learning and artificial intelligence. The well-known representatives of PGMs include Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs), Deep Boltzmann Machines (DBMs), and their variants. These PGMs open a new dimension of machine lear...More

Code:

Data:

0
Introduction
  • Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes.
  • PGMs have big advantages in data representation, and have made great successes in pattern recognition and artificial intelligence, such as image recognition [1] and segmentation [2], human motion generation [3] and recognition [4].
  • There are three advantageous properties of PGMs. The first one is that they provide a simple way to specify the structure of a probabilistic model via graph representation.
Highlights
  • Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes
  • We demonstrate classification rates of the applied compressed network to Probabilistic Graphical Models, where the Probabilistic Graphical Networks are not pruned by layer-by-layer manner
  • Inspired by the parameter redundancy in deep deterministic neural networks, such as convolutional neural networks, the same problem in deep Probabilistic Graphical Networks confirmed in this paper
  • We use a percentage and magnitude based pruning method which is easy and widely used in the deep compression techniques to fill a gap between the deep compression techniques and deep Probabilistic Graphical Networks
  • More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities
  • The experiments show a spectrum of pruning percentages for deep Probabilistic Graphical Networks, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy
Results
  • The authors use the developed pruning and retraining approach to evaluate parameter redundancy of the deep PGNs and compression performance.
  • For DBNs and DBMs, the raw and pruned generative models are used to initialize corresponding discriminative ones, and further evaluate their capabilities of initialization and recognition.
  • For these models, their latent nodes at the highest layer will be regarded as input features for a logistic regression classifier to construct the corresponding recogniton models.
  • The codes are programmed with pytorch 0.4.0, and the experiment environment is a workstation with Intel Xeon(R) CPU E5-2640 v4 @ 2.40GHz 40, 32 G memory and a Tesla K40 GPU
Conclusion
  • Inspired by the parameter redundancy in deep deterministic neural networks, such as CNNs, the same problem in deep PGNs confirmed in this paper.
  • More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities.
  • The experiments show a spectrum of pruning percentages for deep PGNs, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy.
  • No matter compress them or not, deep PGNs have better performance for gray image datasets.
  • Deep convolutional networks concentrate more on color image datasets
Summary
  • Introduction:

    Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes.
  • PGMs have big advantages in data representation, and have made great successes in pattern recognition and artificial intelligence, such as image recognition [1] and segmentation [2], human motion generation [3] and recognition [4].
  • There are three advantageous properties of PGMs. The first one is that they provide a simple way to specify the structure of a probabilistic model via graph representation.
  • Results:

    The authors use the developed pruning and retraining approach to evaluate parameter redundancy of the deep PGNs and compression performance.
  • For DBNs and DBMs, the raw and pruned generative models are used to initialize corresponding discriminative ones, and further evaluate their capabilities of initialization and recognition.
  • For these models, their latent nodes at the highest layer will be regarded as input features for a logistic regression classifier to construct the corresponding recogniton models.
  • The codes are programmed with pytorch 0.4.0, and the experiment environment is a workstation with Intel Xeon(R) CPU E5-2640 v4 @ 2.40GHz 40, 32 G memory and a Tesla K40 GPU
  • Conclusion:

    Inspired by the parameter redundancy in deep deterministic neural networks, such as CNNs, the same problem in deep PGNs confirmed in this paper.
  • More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities.
  • The experiments show a spectrum of pruning percentages for deep PGNs, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy.
  • No matter compress them or not, deep PGNs have better performance for gray image datasets.
  • Deep convolutional networks concentrate more on color image datasets
Tables
  • Table1: Pruned RBMs feature extraction with logistic regression classifier for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage, and Acc represents classification accuracy of logistic regression
  • Table2: Classification accuracy initialized by pruned DBNs and DBMs for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage
  • Table3: Classification accuracy of the applied compressed networks for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage
Download tables as Excel
Funding
  • This research is sponsored in part by the National Natural Science Foundation of China under Grant No 61603096, 61751202, 61751205, 61572540, U1813203, U1801262, and the Natural Science Foundation of Fujian Province under Grant No 2017J01750
Reference
  • J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, T. Chen, Recent advances in convolutional neural networks, Pattern Recognit. 77 (2018) 354–377.
    Google ScholarLocate open access versionFindings
  • E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2017) 640–651.
    Google ScholarLocate open access versionFindings
  • G.W. Taylor, G.E. Hinton, S. Roweis, Modeling human motion using binary latent variables, in: Advances in Neural Information Processing Systems, MIT Press, 2007, pp. 1345–1352.
    Google ScholarLocate open access versionFindings
  • J. Chang, L. Wang, G. Meng, S. Xiang, C. Pan, Deep unsupervised learning with consistent inference of latent representations, Pattern Recognit. 77 (2018) 438–453.
    Google ScholarLocate open access versionFindings
  • C.M. Bishop, Pattern recognition and machine learning (information science and statistics).
    Google ScholarFindings
  • C.Y. Zhang, C.L.P. Chen, D. Chen, N.G. Kin Tek, Mapreduce based distributed learning algorithm for restricted Boltzmann machine, Neurocomputing 198 (2016) 4–11.
    Google ScholarFindings
  • G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18 (7) (2006) 1527–1554.
    Google ScholarLocate open access versionFindings
  • R. Salakhutdinov, G. Hinton, An efficient learning procedure for deep Boltzmann machines, Neural Comput. 24 (8) (2012) 1967–2006.
    Google ScholarLocate open access versionFindings
  • R. Salakhutdinov, A. Mnih, G. Hinton, Restricted Boltzmann machines for collaborative filtering, in: Proceedings of the 24th international conference on Machine learning, 2007, pp. 791–798.
    Google ScholarLocate open access versionFindings
  • A.-r. Mohamed, G. Dahl, G. Hinton, Deep belief networks for phone recognition, in: Nips workshop on deep learning for speech recognition and related applications, 1, Vancouver, Canada, 2009, p. 39.
    Google ScholarFindings
  • T. BaltruA !‘aitis, C. Ahuja, L. Morency, Multimodal machine learning: a survey and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2019) 423–443.
    Google ScholarLocate open access versionFindings
  • N. Srivastava, R.R. Salakhutdinov, G.E. Hinton, Modeling documents with deep Boltzmann machines, 2013, pp. 1–8. arXiv preprint arXiv:1309.6865.
    Findings
  • C.Y. Zhang, C.L.P. Chen, M. Gan, L. Chen, Predictive deep Boltzmann machine for multiperiod wind speed forecasting, IEEE Trans. Sustain. Energy 6 (4) (2017) 1416–1425.
    Google ScholarLocate open access versionFindings
  • Y. Wang, B. Dai, G. Hua, J. Aston, D. Wipf, Recurrent variational autoencoders for learning nonlinear generative models in the presence of outliers, IEEE J. Sel. Topic. Signal Process. 12 (6) (2018) 1615–1627.
    Google ScholarLocate open access versionFindings
  • Z. Gou, L. Han, L. Sun, J. Zhu, H. Yan, Constructing dynamic topic models based on variational autoencoder and factor graph, IEEE Access 6 (2018) 53102–53111.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.
    Google ScholarLocate open access versionFindings
  • S. Han, H. Mao, W.J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, ICLR, 2016.
    Google ScholarFindings
  • S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in: Advances in neural information processing systems, 2015, pp. 1135–1143.
    Google ScholarFindings
  • J. Yang, W. Xiong, S. Li, C. Xu, Learning structured and non-redundant representations with deep neural networks, Pattern Recognit. 86 (2019) 224–235.
    Google ScholarLocate open access versionFindings
  • B. Hou, Y. Wang, Q. Liu, Change detection based on deep features and low rank, IEEE Geosci. Remote Sens. Lett. 14 (12) (2017) 2418–2422.
    Google ScholarLocate open access versionFindings
  • Z. Chen, Z. Cao, J. Guo, Distilling the knowledge from handcrafted features for human activity recognition, IEEE Trans. Ind. Inf. 14 (10) (2018) 4334–4342.
    Google ScholarLocate open access versionFindings
  • D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009.
    Google ScholarFindings
  • H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 609–616.
    Google ScholarLocate open access versionFindings
  • D. Chen, J. Lv, Z. Yi, Graph regularized restricted Boltzmann machine, IEEE Trans. Neural Netw. Learn.Syst. 29 (6) (2018) 2651–2659.
    Google ScholarLocate open access versionFindings
  • Z. Chen, N.L. Zhang, D.-Y. Yeung, P. Chen, Sparse Boltzmann machines with structure learning as applied to text analysis., in: AAAI, 2017, pp. 1805–1811.
    Google ScholarFindings
  • N. Meinshausen, P. Bühlmann, et al., High-dimensional graphs and variable selection with the lasso, Annal. Stat. 34 (3) (2006) 1436–1462.
    Google ScholarLocate open access versionFindings
  • J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (3) (2008) 432–441.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, The mnist database of handwritten digits, 1998. http://yann.lecun.com/exdb/mnist/.
    Findings
  • H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017. https://github.com/zalandoresearch/fashion-mnist.
    Findings
  • A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Technical Report, Citeseer, 2009.
    Google ScholarFindings
  • C.L.P. Chen, C.Y. Zhang, L. Chen, M. Gan, Fuzzy restricted Boltzmann machine for the enhancement of deep learning, IEEE Trans. Fuzzy Syst. 23 (6) (2015) 2163–2173.
    Google ScholarLocate open access versionFindings
  • Y. Bengio, Learning deep architectures for ai, Found. Trend. Mach. Learn. 2 (1) (2009) 1–127.
    Google ScholarFindings
  • G.E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput. 14 (8) (2002) 1771–1800.
    Google ScholarLocate open access versionFindings
  • G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
    Google ScholarLocate open access versionFindings
  • R.M. Neal, Connectionist learning of belief networks, Artif. Intell. 56 (1) (1992) 71–113.
    Google ScholarLocate open access versionFindings
  • C. Louizos, K. Ullrich, M. Welling, Bayesian compression for deep learning, in: Advances in Neural Information Processing Systems, 2017, pp. 3288–3298.
    Google ScholarLocate open access versionFindings
  • S. Bak, Generalized linear regression model with LASSO, group LASSO, and sparse group LASSO regularization methods for finding bacteria associated with colorectal cancer using microbiome data, 2017 Ph.D. thesis.
    Google ScholarFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
    Google ScholarLocate open access versionFindings
  • C.L.P. Chen, Z. Liu, Broad learning system: an effective and efficient incremental learning system without the need for deep architecture, IEEE Trans. Neural Netw. Learn. Syst. 29 (1) (2017) 10–24. Chun-Yang Zhang received the B.S. degree in Mathematics from Beijing Normal University Zhuhai, China, in 2010 and M.S. degree in Mathematics from University of Macau, Macau, in 2012. He also received the Ph.D. degree in Computer Sciences from University of Macau, Macau, in 2015. He is currently working as an associate professor in School of Mathematics and Computer Science at Fuzhou University. His research interests include machine learning, computer vision, computational intelligence, and big data analysis.
    Google ScholarLocate open access versionFindings
  • Qi Zhao received the B.S. degree in Software Engineering from Fuzhou University Fuzhou, China, in 2017 and is currently studying as a graduate student in Fuzhou University. His research interests include deep learning, probabilistic graphical models and model compression.
    Google ScholarFindings
  • Wenxi Liu received the B.S. degree in Computer Science from Shenzhen University, China, in 2010 and the Ph.D. degree in Computer Sciences from City University of Hong Kong, HongKong, China, in 2015. He is currently working as an associate professor in School of Mathematics and Computer Science at Fuzhou University. His research interests include machine learning and computer vision.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments