# Deep compression of probabilistic graphical networks

Pattern Recognition, pp. 1069792019.

EI

Keywords:

Weibo:

Abstract:

Probabilistic Graphical Models (PGMs) are important and active research areas in machine learning and artificial intelligence. The well-known representatives of PGMs include Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs), Deep Boltzmann Machines (DBMs), and their variants. These PGMs open a new dimension of machine lear...More

Code:

Data:

Introduction

- Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes.
- PGMs have big advantages in data representation, and have made great successes in pattern recognition and artificial intelligence, such as image recognition [1] and segmentation [2], human motion generation [3] and recognition [4].
- There are three advantageous properties of PGMs. The first one is that they provide a simple way to specify the structure of a probabilistic model via graph representation.

Highlights

- Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes
- We demonstrate classification rates of the applied compressed network to Probabilistic Graphical Models, where the Probabilistic Graphical Networks are not pruned by layer-by-layer manner
- Inspired by the parameter redundancy in deep deterministic neural networks, such as convolutional neural networks, the same problem in deep Probabilistic Graphical Networks confirmed in this paper
- We use a percentage and magnitude based pruning method which is easy and widely used in the deep compression techniques to fill a gap between the deep compression techniques and deep Probabilistic Graphical Networks
- More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities
- The experiments show a spectrum of pruning percentages for deep Probabilistic Graphical Networks, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy

Results

- The authors use the developed pruning and retraining approach to evaluate parameter redundancy of the deep PGNs and compression performance.
- For DBNs and DBMs, the raw and pruned generative models are used to initialize corresponding discriminative ones, and further evaluate their capabilities of initialization and recognition.
- For these models, their latent nodes at the highest layer will be regarded as input features for a logistic regression classifier to construct the corresponding recogniton models.
- The codes are programmed with pytorch 0.4.0, and the experiment environment is a workstation with Intel Xeon(R) CPU E5-2640 v4 @ 2.40GHz 40, 32 G memory and a Tesla K40 GPU

Conclusion

- Inspired by the parameter redundancy in deep deterministic neural networks, such as CNNs, the same problem in deep PGNs confirmed in this paper.
- More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities.
- The experiments show a spectrum of pruning percentages for deep PGNs, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy.
- No matter compress them or not, deep PGNs have better performance for gray image datasets.
- Deep convolutional networks concentrate more on color image datasets

Summary

## Introduction:

Probabilistic Graphical Models(PGMs) play a significant role in modern machine learning, in which nodes are presented by random variables and connections denote the statistical dependences between nodes.- PGMs have big advantages in data representation, and have made great successes in pattern recognition and artificial intelligence, such as image recognition [1] and segmentation [2], human motion generation [3] and recognition [4].
- There are three advantageous properties of PGMs. The first one is that they provide a simple way to specify the structure of a probabilistic model via graph representation.
## Results:

The authors use the developed pruning and retraining approach to evaluate parameter redundancy of the deep PGNs and compression performance.- For DBNs and DBMs, the raw and pruned generative models are used to initialize corresponding discriminative ones, and further evaluate their capabilities of initialization and recognition.
- For these models, their latent nodes at the highest layer will be regarded as input features for a logistic regression classifier to construct the corresponding recogniton models.
- The codes are programmed with pytorch 0.4.0, and the experiment environment is a workstation with Intel Xeon(R) CPU E5-2640 v4 @ 2.40GHz 40, 32 G memory and a Tesla K40 GPU
## Conclusion:

Inspired by the parameter redundancy in deep deterministic neural networks, such as CNNs, the same problem in deep PGNs confirmed in this paper.- More than 50% of their weight connections can be removed based on MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, while remain or even improve their generative and discriminative capabilities.
- The experiments show a spectrum of pruning percentages for deep PGNs, and capabilities of these models are continuous with respect to the pruned percentages, so a compromise can be achieved between the model capacity and redundancy.
- No matter compress them or not, deep PGNs have better performance for gray image datasets.
- Deep convolutional networks concentrate more on color image datasets

- Table1: Pruned RBMs feature extraction with logistic regression classifier for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage, and Acc represents classification accuracy of logistic regression
- Table2: Classification accuracy initialized by pruned DBNs and DBMs for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage
- Table3: Classification accuracy of the applied compressed networks for MNIST, Fashion-MNIST and CIFAR-10. Pec represents the pruning percentage

Funding

- This research is sponsored in part by the National Natural Science Foundation of China under Grant No 61603096, 61751202, 61751205, 61572540, U1813203, U1801262, and the Natural Science Foundation of Fujian Province under Grant No 2017J01750

Reference

- J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, T. Chen, Recent advances in convolutional neural networks, Pattern Recognit. 77 (2018) 354–377.
- E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2017) 640–651.
- G.W. Taylor, G.E. Hinton, S. Roweis, Modeling human motion using binary latent variables, in: Advances in Neural Information Processing Systems, MIT Press, 2007, pp. 1345–1352.
- J. Chang, L. Wang, G. Meng, S. Xiang, C. Pan, Deep unsupervised learning with consistent inference of latent representations, Pattern Recognit. 77 (2018) 438–453.
- C.M. Bishop, Pattern recognition and machine learning (information science and statistics).
- C.Y. Zhang, C.L.P. Chen, D. Chen, N.G. Kin Tek, Mapreduce based distributed learning algorithm for restricted Boltzmann machine, Neurocomputing 198 (2016) 4–11.
- G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18 (7) (2006) 1527–1554.
- R. Salakhutdinov, G. Hinton, An efficient learning procedure for deep Boltzmann machines, Neural Comput. 24 (8) (2012) 1967–2006.
- R. Salakhutdinov, A. Mnih, G. Hinton, Restricted Boltzmann machines for collaborative filtering, in: Proceedings of the 24th international conference on Machine learning, 2007, pp. 791–798.
- A.-r. Mohamed, G. Dahl, G. Hinton, Deep belief networks for phone recognition, in: Nips workshop on deep learning for speech recognition and related applications, 1, Vancouver, Canada, 2009, p. 39.
- T. BaltruA !‘aitis, C. Ahuja, L. Morency, Multimodal machine learning: a survey and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2019) 423–443.
- N. Srivastava, R.R. Salakhutdinov, G.E. Hinton, Modeling documents with deep Boltzmann machines, 2013, pp. 1–8. arXiv preprint arXiv:1309.6865.
- C.Y. Zhang, C.L.P. Chen, M. Gan, L. Chen, Predictive deep Boltzmann machine for multiperiod wind speed forecasting, IEEE Trans. Sustain. Energy 6 (4) (2017) 1416–1425.
- Y. Wang, B. Dai, G. Hua, J. Aston, D. Wipf, Recurrent variational autoencoders for learning nonlinear generative models in the presence of outliers, IEEE J. Sel. Topic. Signal Process. 12 (6) (2018) 1615–1627.
- Z. Gou, L. Han, L. Sun, J. Zhu, H. Yan, Constructing dynamic topic models based on variational autoencoder and factor graph, IEEE Access 6 (2018) 53102–53111.
- A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.
- S. Han, H. Mao, W.J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, ICLR, 2016.
- S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in: Advances in neural information processing systems, 2015, pp. 1135–1143.
- J. Yang, W. Xiong, S. Li, C. Xu, Learning structured and non-redundant representations with deep neural networks, Pattern Recognit. 86 (2019) 224–235.
- B. Hou, Y. Wang, Q. Liu, Change detection based on deep features and low rank, IEEE Geosci. Remote Sens. Lett. 14 (12) (2017) 2418–2422.
- Z. Chen, Z. Cao, J. Guo, Distilling the knowledge from handcrafted features for human activity recognition, IEEE Trans. Ind. Inf. 14 (10) (2018) 4334–4342.
- D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009.
- H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 609–616.
- D. Chen, J. Lv, Z. Yi, Graph regularized restricted Boltzmann machine, IEEE Trans. Neural Netw. Learn.Syst. 29 (6) (2018) 2651–2659.
- Z. Chen, N.L. Zhang, D.-Y. Yeung, P. Chen, Sparse Boltzmann machines with structure learning as applied to text analysis., in: AAAI, 2017, pp. 1805–1811.
- N. Meinshausen, P. Bühlmann, et al., High-dimensional graphs and variable selection with the lasso, Annal. Stat. 34 (3) (2006) 1436–1462.
- J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (3) (2008) 432–441.
- Y. LeCun, The mnist database of handwritten digits, 1998. http://yann.lecun.com/exdb/mnist/.
- H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017. https://github.com/zalandoresearch/fashion-mnist.
- A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Technical Report, Citeseer, 2009.
- C.L.P. Chen, C.Y. Zhang, L. Chen, M. Gan, Fuzzy restricted Boltzmann machine for the enhancement of deep learning, IEEE Trans. Fuzzy Syst. 23 (6) (2015) 2163–2173.
- Y. Bengio, Learning deep architectures for ai, Found. Trend. Mach. Learn. 2 (1) (2009) 1–127.
- G.E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput. 14 (8) (2002) 1771–1800.
- G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
- R.M. Neal, Connectionist learning of belief networks, Artif. Intell. 56 (1) (1992) 71–113.
- C. Louizos, K. Ullrich, M. Welling, Bayesian compression for deep learning, in: Advances in Neural Information Processing Systems, 2017, pp. 3288–3298.
- S. Bak, Generalized linear regression model with LASSO, group LASSO, and sparse group LASSO regularization methods for finding bacteria associated with colorectal cancer using microbiome data, 2017 Ph.D. thesis.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
- C.L.P. Chen, Z. Liu, Broad learning system: an effective and efficient incremental learning system without the need for deep architecture, IEEE Trans. Neural Netw. Learn. Syst. 29 (1) (2017) 10–24. Chun-Yang Zhang received the B.S. degree in Mathematics from Beijing Normal University Zhuhai, China, in 2010 and M.S. degree in Mathematics from University of Macau, Macau, in 2012. He also received the Ph.D. degree in Computer Sciences from University of Macau, Macau, in 2015. He is currently working as an associate professor in School of Mathematics and Computer Science at Fuzhou University. His research interests include machine learning, computer vision, computational intelligence, and big data analysis.
- Qi Zhao received the B.S. degree in Software Engineering from Fuzhou University Fuzhou, China, in 2017 and is currently studying as a graduate student in Fuzhou University. His research interests include deep learning, probabilistic graphical models and model compression.
- Wenxi Liu received the B.S. degree in Computer Science from Shenzhen University, China, in 2010 and the Ph.D. degree in Computer Sciences from City University of Hong Kong, HongKong, China, in 2015. He is currently working as an associate professor in School of Mathematics and Computer Science at Fuzhou University. His research interests include machine learning and computer vision.

Tags

Comments