# Orthogonal Convolutional Neural Networks

CVPR, pp. 11502-11512, 2019.

EI

Keywords:

Frchet inception distancedeep neural networkgenerative adversarial networkskernel orthogonalityorthogonal CNNMore(9+)

Weibo:

Abstract:

The instability and feature redundancy in CNNs hinders further performance improvement. Using orthogonality as a regularizer has shown success in alleviating these issues. Previous works however only considered the kernel orthogonality in the convolution layers of CNNs, which is a necessary but not sufficient condition for orthogonal co...More

Code:

Data:

Introduction

- While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31].
- The model capacity is better utilized, which improves the feature expressiveness and the task performance

Highlights

- While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31]
- We propose orthogonal convolutional neural networks (OCNN), where a convolutional layer is regularized with orthogonality constraints during training
- We show that our regularization enforces orthogonal convolutions more effectively than kernel orthogonality methods, and we further develop an efficient approach for our orthogonal CNN regularization
- The third set of experiments focuses on the robustness of orthogonal CNN under adversarial attacks
- We develop an efficient orthogonal CNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches
- Our orthogonal CNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings

Methods

- The authors conduct 3 sets of experiments to evaluate OCNNs. The first set benchmarks the approach on image classification datasets CIFAR100 and ImageNet. The second set benchmarks the performance under semi-supervised settings and focuses on qualities of learned features.
- For highlevel visual feature qualities, the authors experiment on the finegrained bird image retrieval.
- For low-level visual features, the authors experiment on unsupervised image inpainting.
- The authors compare visual feature qualities in image generation tasks.
- The third set of experiments focuses on the robustness of OCNN under adversarial attacks.
- The authors analyze OCNNs in terms of DBT matrix K’s spectrum, feature similarity, hyperparameter tuning, and space/time complexity

Results

- The authors' approach achieves 78.1%, 78.7%, and 79.5% image classification accuracies with ResNet18, ResNet34 and ResNet50, respectively.
- The authors' OCNNs achieve 3% and 1% gain over plain baselines and kernel orthogonal regularizers.
- The authors' approach achieves the highest accuracy when λ = 0.1

Conclusion

- The authors develop an efficient OCNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches.
- The authors' OCNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings.
- It learns more diverse and expressive features with better training stability, robustness, and generalization.
- The authors thank Xudong Wang for discussions on filter similarity, Jesse Livezey for the pointer to a previous proof for row-column orthogonality equivalence, Haoran Guo, Ryan Zarcone, and Pratik Sachdeva for proofreading, and anonymous reviewers for their insightful comments

Summary

## Introduction:

While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31].- The model capacity is better utilized, which improves the feature expressiveness and the task performance
## Methods:

The authors conduct 3 sets of experiments to evaluate OCNNs. The first set benchmarks the approach on image classification datasets CIFAR100 and ImageNet. The second set benchmarks the performance under semi-supervised settings and focuses on qualities of learned features.- For highlevel visual feature qualities, the authors experiment on the finegrained bird image retrieval.
- For low-level visual features, the authors experiment on unsupervised image inpainting.
- The authors compare visual feature qualities in image generation tasks.
- The third set of experiments focuses on the robustness of OCNN under adversarial attacks.
- The authors analyze OCNNs in terms of DBT matrix K’s spectrum, feature similarity, hyperparameter tuning, and space/time complexity
## Results:

The authors' approach achieves 78.1%, 78.7%, and 79.5% image classification accuracies with ResNet18, ResNet34 and ResNet50, respectively.- The authors' OCNNs achieve 3% and 1% gain over plain baselines and kernel orthogonal regularizers.
- The authors' approach achieves the highest accuracy when λ = 0.1
## Conclusion:

The authors develop an efficient OCNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches.- The authors' OCNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings.
- It learns more diverse and expressive features with better training stability, robustness, and generalization.
- The authors thank Xudong Wang for discussions on filter similarity, Jesse Livezey for the pointer to a previous proof for row-column orthogonality equivalence, Haoran Guo, Ryan Zarcone, and Pratik Sachdeva for proofreading, and anonymous reviewers for their insightful comments

- Table1: Summary of experiments and OCNN gains
- Table2: Top-1 accuracies on CIFAR100. Our OCNN outperforms baselines and the SOTA orthogonal regularizations
- Table3: WideResNet [<a class="ref-link" id="c59" href="#r59">59</a>] performance. We observe improved performance of OCNNs
- Table4: Top-1 and Top-5 errors on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>] with ResNet34
- Table5: Top-1 accuracies on CIFAR100 with different fractions of labeled data. OCNNs are consistently better
- Table6: Quantitative comparisons on the standard inpainting dataset [<a class="ref-link" id="c27" href="#r27">27</a>]. Our conv-orthogonality outperforms the SOTA methods
- Table7: Inception Score and Frchet Inception Distance comparison on CIFAR10. Our OCNN outperforms the baseline [<a class="ref-link" id="c18" href="#r18">18</a>] by 0.3 IS and 1.3 FID
- Table8: Attack time and number of necessary attack queries needed for 90% successful attack rate
- Table9: Model size and training/ test time on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>]
- Table10: Number of unique detectors (feature channels with mIoU ≥ 0.04) comparisons on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>]
- Table11: Retrieval/clustering performance on Cars196 (%)

Related work

- Im2col-Based Convolutions. The im2col method [58, 26] has been widely used in deep learning as it enables efficient GPU computation. It transforms the convolution into a General Matrix to Matrix Multiplication (GEMM) problem.

Fig.2a illustrates the procedure. a) Given an input X, we first construct a new input-patch-matrix X ∈ RCk2×H′W ′ by copying patches from the input and unrolling them into columns of this intermediate matrix. b) The kernel-patchmatrix K ∈ RM×Ck2 can then be constructed by reshaping the original kernel tensor. Here we use the same notation for simplicity. c) We can calculate the output Y = KX where we reshape Y back to the tensor of size M × H × W – the desired output of the convolution.

The orthogonal kernel regularization enforces the kernel K ∈ RM×Ck2 to be orthogonal. Specifically, if M ≤ Ck2, the row orthogonal regularizer is Lkorth-row = KKT − I F where I is the identity matrix. Otherwise, column orthogonal may be achieved by Lkorth-col = KT K − I F .

Funding

- This research was supported, in part, by Berkeley Deep Drive, DARPA, and NSF-IIS-1718991

Reference

- M. Arjovsky, A. Shah, and Y. Bengio. Unitary evolution recurrent neural networks. In ICML, pages 1120–1128, 2016. 2, 3
- J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. 3
- R. Balestriero and R. Baraniuk. Mad max: Affine spline insights into deep learning. arXiv preprint arXiv:1805.06576, 2018. 2
- R. Balestriero et al. A spline theory of deep networks. In ICML, pages 383–392, 2018. 2
- N. Bansal, X. Chen, and Z. Wang. Can we gain more from orthogonality regularizations in training deep cnns? In Advances in Neural Information Processing Systems (NeurIPS), pages 4266–4276, 2018. 2, 3, 5, 6, 8
- D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6541–6549, 2017. 11
- Y. Bengio, P. Simard, P. Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994. 1
- A. Brock, J. Donahue, and K. Simonyan. Large scale GAN training for high fidelity natural image synthesis. In ICLR, 2019. 2, 3, 7
- A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective adversarial networks. In ICLR, 2017. 3, 7
- M. L. Casado and D. Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In ICML, pages 3794– 3803, 2019. 3
- Y. Chen, X. Jin, J. Feng, and S. Yan. Training group orthogonal neural networks with privileged information. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1532–1538, 2017. 3
- B. Cheung, A. Terekhov, Y. Chen, P. Agrawal, and B. A. Olshausen. Superposition of many models into one. In Advances in Neural Information Processing Systems (NeurIPS), pages 10867–10876, 2019. 1
- Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems (NIPS), pages 2933–2941, 2014. 1
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. 1, 5, 6, 8, 11
- V. Dorobantu, P. A. Stromhaug, and J. Renteria. Dizzyrnn: Reparameterizing recurrent neural networks for norm-preserving backpropagation. arXiv preprint arXiv:1612.04035, 2016. 3
- Y. Du and I. Mordatch. Implicit generation and generalization in energy-based models. Advances in Neural Information Processing Systems (NeurIPS), 2019. 7
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 249–256, 2010. 1, 3
- X. Gong, S. Chang, Y. Jiang, and Z. Wang. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3224–3234, 2019. 7
- C. Guo, J. R. Gardner, Y. You, A. G. Wilson, and K. Q. Weinberger. Simple black-box adversarial attacks. In Proceedings of the International Conference on Machine Learning (ICML), pages 2484–2493, 207
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: efficient inference engine on compressed deep neural network. In IEEE Annual International Symposium on Computer Architecture (ISCA), pages 243– 254, 2016. 3
- S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016. 1
- M. Harandi and B. Fernando. Generalized backpropagation, etude de cas: Orthogonality. arXiv preprint arXiv:1611.05927, 2016. 3
- K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), pages 1026–1034, 2015. 3
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 5, 6, 7, 8, 11
- Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1389–1397, 2017. 3
- F. Heide, W. Heidrich, and G. Wetzstein. Fast and flexible convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5135–5143, 2015. 2, 7
- F. Heide, W. Heidrich, and G. Wetzstein. Fast and flexible convolutional sparse coding. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 7
- E. Hoffer and N. Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92. Springer, 2015. 12
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 3
- L. Huang, X. Liu, B. Lang, A. W. Yu, Y. Wang, and B. Li. Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In AAAI Conference on Artificial Intelligence (AAAI), 2018. 3, 5, 6, 8
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, pages 448–456, 2015. 1, 3
- M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC), 2014. 3
- J. Kovacevic, A. Chebira, et al. An introduction to frames. Foundations and Trends in Signal Processing, 2(1):1–94, 2008. 4
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013. 12
- A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Canada, 2009. 5
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1097–1105, 2012. 1
- Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng. Ica with reconstruction cost for efficient overcomplete feature learning. In Advances in Neural Information Processing Systems (NIPS), pages 1017–1025, 2011. 5, 12
- S. Li, S. Bak, P. Carr, and X. Wang. Diversity regularized spatiotemporal attention for video-based person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 369–378, 2018. 3
- D. Mishkin and J. Matas. All you need is a good init. In ICLR, 2016. 3
- T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In ICLR, 2018. 2, 3, 7
- Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision, pages 360–368, 2017. 12
- G. Ostrovski, W. Dabney, and R. Munos. Autoregressive quantile networks for generative modeling. In Proceedings of the International Conference on Machine Learning (ICML), pages 3936–3945, 2018. 7
- M. Ozay and T. Okatani. Optimization on submanifolds of convolution kernels in cnns. arXiv preprint arXiv:1610.07008, 2016. 3
- V. Papyan, Y. Romano, J. Sulam, and M. Elad. Convolutional dictionary learning via local processing. In ICCV, 2017. 7
- R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In ICML, pages 1310– 1318, 2013. 3
- P. Rodrıguez, J. Gonzalez, G. Cucurull, J. M. Gonfaus, and F. X. Roca. Regularizing cnns with locally constrained decorrelations. In ICLR, 2017. 2, 3
- T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 901–909, 2016. 3
- A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR, 2014. 3
- H. Sedghi, V. Gupta, and P. M. Long. The singular values of convolutional layers. In ICLR, 2019. 3
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1
- J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In ICLR (workshop track), 2015. 8
- D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 6, 7
- A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems (NIPS), pages 4790–4798, 2016. 7
- E. Vorontsov, C. Trabelsi, S. Kadoury, and C. Pal. On orthogonality and learning recurrent networks with long term dependencies. In ICML, pages 3570–3578, 2017. 3
- P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010. 6
- S. Wisdom, T. Powers, J. Hershey, J. Le Roux, and L. Atlas. Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 4880–4888, 2016. 3
- D. Xie, J. Xiong, and S. Pu. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6176–6185, 2017. 2, 3, 5, 6, 7, 8
- K. Yanai, R. Tanno, and K. Okamoto. Efficient mobile implementation of a cnn-based object recognition system. In Proceedings of the International Conference on Multimedia (ICM), pages 362–366, 2016. 2
- S. Zagoruyko and N. Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016. 5
- H. Zheng, J. Fu, T. Mei, and J. Luo. Learning multi-attention convolutional neural network for fine-grained image recognition. In ICCV, pages 5209–5217, 2017. 3
- J. Zhou, M. N. Do, and J. Kovacevic. Special paraunitary matrices, cayley transform, and multidimensional orthogonal filter banks. IEEE Transactions on Image Processing, 15(2):511–519, 2006. 2

Tags

Comments