Orthogonal Convolutional Neural Networks

CVPR, pp. 11502-11512, 2019.

Cited by: 1|Bibtex|Views20|Links
EI
Keywords:
Frchet inception distancedeep neural networkgenerative adversarial networkskernel orthogonalityorthogonal CNNMore(9+)
Weibo:
We develop an efficient orthogonal CNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches

Abstract:

The instability and feature redundancy in CNNs hinders further performance improvement. Using orthogonality as a regularizer has shown success in alleviating these issues. Previous works however only considered the kernel orthogonality in the convolution layers of CNNs, which is a necessary but not sufficient condition for orthogonal co...More

Code:

Data:

0
Introduction
  • While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31].
  • The model capacity is better utilized, which improves the feature expressiveness and the task performance
Highlights
  • While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31]
  • We propose orthogonal convolutional neural networks (OCNN), where a convolutional layer is regularized with orthogonality constraints during training
  • We show that our regularization enforces orthogonal convolutions more effectively than kernel orthogonality methods, and we further develop an efficient approach for our orthogonal CNN regularization
  • The third set of experiments focuses on the robustness of orthogonal CNN under adversarial attacks
  • We develop an efficient orthogonal CNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches
  • Our orthogonal CNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings
Methods
  • The authors conduct 3 sets of experiments to evaluate OCNNs. The first set benchmarks the approach on image classification datasets CIFAR100 and ImageNet. The second set benchmarks the performance under semi-supervised settings and focuses on qualities of learned features.
  • For highlevel visual feature qualities, the authors experiment on the finegrained bird image retrieval.
  • For low-level visual features, the authors experiment on unsupervised image inpainting.
  • The authors compare visual feature qualities in image generation tasks.
  • The third set of experiments focuses on the robustness of OCNN under adversarial attacks.
  • The authors analyze OCNNs in terms of DBT matrix K’s spectrum, feature similarity, hyperparameter tuning, and space/time complexity
Results
  • The authors' approach achieves 78.1%, 78.7%, and 79.5% image classification accuracies with ResNet18, ResNet34 and ResNet50, respectively.
  • The authors' OCNNs achieve 3% and 1% gain over plain baselines and kernel orthogonal regularizers.
  • The authors' approach achieves the highest accuracy when λ = 0.1
Conclusion
  • The authors develop an efficient OCNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches.
  • The authors' OCNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings.
  • It learns more diverse and expressive features with better training stability, robustness, and generalization.
  • The authors thank Xudong Wang for discussions on filter similarity, Jesse Livezey for the pointer to a previous proof for row-column orthogonality equivalence, Haoran Guo, Ryan Zarcone, and Pratik Sachdeva for proofreading, and anonymous reviewers for their insightful comments
Summary
  • Introduction:

    While convolutional neural networks (CNNs) are widely successful [36, 14, 50], several challenges still exist: over parameterization or under utilization of model capacity [21, 12], exploding or vanishing gradients [7, 17], growth in saddle points [13], and shifts in feature statistics [31].
  • The model capacity is better utilized, which improves the feature expressiveness and the task performance
  • Methods:

    The authors conduct 3 sets of experiments to evaluate OCNNs. The first set benchmarks the approach on image classification datasets CIFAR100 and ImageNet. The second set benchmarks the performance under semi-supervised settings and focuses on qualities of learned features.
  • For highlevel visual feature qualities, the authors experiment on the finegrained bird image retrieval.
  • For low-level visual features, the authors experiment on unsupervised image inpainting.
  • The authors compare visual feature qualities in image generation tasks.
  • The third set of experiments focuses on the robustness of OCNN under adversarial attacks.
  • The authors analyze OCNNs in terms of DBT matrix K’s spectrum, feature similarity, hyperparameter tuning, and space/time complexity
  • Results:

    The authors' approach achieves 78.1%, 78.7%, and 79.5% image classification accuracies with ResNet18, ResNet34 and ResNet50, respectively.
  • The authors' OCNNs achieve 3% and 1% gain over plain baselines and kernel orthogonal regularizers.
  • The authors' approach achieves the highest accuracy when λ = 0.1
  • Conclusion:

    The authors develop an efficient OCNN approach to impose a filter orthogonality condition on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, as opposed to the commonly adopted kernel orthogonality approaches.
  • The authors' OCNN requires no additional parameters and little computational overhead, consistently outperforming the state-of-the-art alternatives on a wide range of tasks such as image classification and inpainting under supervised, semisupervised and unsupervised settings.
  • It learns more diverse and expressive features with better training stability, robustness, and generalization.
  • The authors thank Xudong Wang for discussions on filter similarity, Jesse Livezey for the pointer to a previous proof for row-column orthogonality equivalence, Haoran Guo, Ryan Zarcone, and Pratik Sachdeva for proofreading, and anonymous reviewers for their insightful comments
Tables
  • Table1: Summary of experiments and OCNN gains
  • Table2: Top-1 accuracies on CIFAR100. Our OCNN outperforms baselines and the SOTA orthogonal regularizations
  • Table3: WideResNet [<a class="ref-link" id="c59" href="#r59">59</a>] performance. We observe improved performance of OCNNs
  • Table4: Top-1 and Top-5 errors on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>] with ResNet34
  • Table5: Top-1 accuracies on CIFAR100 with different fractions of labeled data. OCNNs are consistently better
  • Table6: Quantitative comparisons on the standard inpainting dataset [<a class="ref-link" id="c27" href="#r27">27</a>]. Our conv-orthogonality outperforms the SOTA methods
  • Table7: Inception Score and Frchet Inception Distance comparison on CIFAR10. Our OCNN outperforms the baseline [<a class="ref-link" id="c18" href="#r18">18</a>] by 0.3 IS and 1.3 FID
  • Table8: Attack time and number of necessary attack queries needed for 90% successful attack rate
  • Table9: Model size and training/ test time on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>]
  • Table10: Number of unique detectors (feature channels with mIoU ≥ 0.04) comparisons on ImageNet [<a class="ref-link" id="c14" href="#r14">14</a>]
  • Table11: Retrieval/clustering performance on Cars196 (%)
Download tables as Excel
Related work
  • Im2col-Based Convolutions. The im2col method [58, 26] has been widely used in deep learning as it enables efficient GPU computation. It transforms the convolution into a General Matrix to Matrix Multiplication (GEMM) problem.

    Fig.2a illustrates the procedure. a) Given an input X, we first construct a new input-patch-matrix X ∈ RCk2×H′W ′ by copying patches from the input and unrolling them into columns of this intermediate matrix. b) The kernel-patchmatrix K ∈ RM×Ck2 can then be constructed by reshaping the original kernel tensor. Here we use the same notation for simplicity. c) We can calculate the output Y = KX where we reshape Y back to the tensor of size M × H × W – the desired output of the convolution.

    The orthogonal kernel regularization enforces the kernel K ∈ RM×Ck2 to be orthogonal. Specifically, if M ≤ Ck2, the row orthogonal regularizer is Lkorth-row = KKT − I F where I is the identity matrix. Otherwise, column orthogonal may be achieved by Lkorth-col = KT K − I F .
Funding
  • This research was supported, in part, by Berkeley Deep Drive, DARPA, and NSF-IIS-1718991
Reference
  • M. Arjovsky, A. Shah, and Y. Bengio. Unitary evolution recurrent neural networks. In ICML, pages 1120–1128, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. 3
    Findings
  • R. Balestriero and R. Baraniuk. Mad max: Affine spline insights into deep learning. arXiv preprint arXiv:1805.06576, 2018. 2
    Findings
  • R. Balestriero et al. A spline theory of deep networks. In ICML, pages 383–392, 2018. 2
    Google ScholarLocate open access versionFindings
  • N. Bansal, X. Chen, and Z. Wang. Can we gain more from orthogonality regularizations in training deep cnns? In Advances in Neural Information Processing Systems (NeurIPS), pages 4266–4276, 2018. 2, 3, 5, 6, 8
    Google ScholarLocate open access versionFindings
  • D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6541–6549, 2017. 11
    Google ScholarLocate open access versionFindings
  • Y. Bengio, P. Simard, P. Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994. 1
    Google ScholarLocate open access versionFindings
  • A. Brock, J. Donahue, and K. Simonyan. Large scale GAN training for high fidelity natural image synthesis. In ICLR, 2019. 2, 3, 7
    Google ScholarLocate open access versionFindings
  • A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective adversarial networks. In ICLR, 2017. 3, 7
    Google ScholarLocate open access versionFindings
  • M. L. Casado and D. Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In ICML, pages 3794– 3803, 2019. 3
    Google ScholarLocate open access versionFindings
  • Y. Chen, X. Jin, J. Feng, and S. Yan. Training group orthogonal neural networks with privileged information. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1532–1538, 2017. 3
    Google ScholarLocate open access versionFindings
  • B. Cheung, A. Terekhov, Y. Chen, P. Agrawal, and B. A. Olshausen. Superposition of many models into one. In Advances in Neural Information Processing Systems (NeurIPS), pages 10867–10876, 2019. 1
    Google ScholarLocate open access versionFindings
  • Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems (NIPS), pages 2933–2941, 2014. 1
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. 1, 5, 6, 8, 11
    Google ScholarLocate open access versionFindings
  • V. Dorobantu, P. A. Stromhaug, and J. Renteria. Dizzyrnn: Reparameterizing recurrent neural networks for norm-preserving backpropagation. arXiv preprint arXiv:1612.04035, 2016. 3
    Findings
  • Y. Du and I. Mordatch. Implicit generation and generalization in energy-based models. Advances in Neural Information Processing Systems (NeurIPS), 2019. 7
    Google ScholarLocate open access versionFindings
  • X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 249–256, 2010. 1, 3
    Google ScholarLocate open access versionFindings
  • X. Gong, S. Chang, Y. Jiang, and Z. Wang. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3224–3234, 2019. 7
    Google ScholarLocate open access versionFindings
  • C. Guo, J. R. Gardner, Y. You, A. G. Wilson, and K. Q. Weinberger. Simple black-box adversarial attacks. In Proceedings of the International Conference on Machine Learning (ICML), pages 2484–2493, 207
    Google ScholarLocate open access versionFindings
  • S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: efficient inference engine on compressed deep neural network. In IEEE Annual International Symposium on Computer Architecture (ISCA), pages 243– 254, 2016. 3
    Google ScholarLocate open access versionFindings
  • S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016. 1
    Google ScholarLocate open access versionFindings
  • M. Harandi and B. Fernando. Generalized backpropagation, etude de cas: Orthogonality. arXiv preprint arXiv:1611.05927, 2016. 3
    Findings
  • K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), pages 1026–1034, 2015. 3
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 5, 6, 7, 8, 11
    Google ScholarLocate open access versionFindings
  • Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1389–1397, 2017. 3
    Google ScholarLocate open access versionFindings
  • F. Heide, W. Heidrich, and G. Wetzstein. Fast and flexible convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5135–5143, 2015. 2, 7
    Google ScholarLocate open access versionFindings
  • F. Heide, W. Heidrich, and G. Wetzstein. Fast and flexible convolutional sparse coding. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 7
    Google ScholarLocate open access versionFindings
  • E. Hoffer and N. Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92. Springer, 2015. 12
    Google ScholarLocate open access versionFindings
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 3
    Findings
  • L. Huang, X. Liu, B. Lang, A. W. Yu, Y. Wang, and B. Li. Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In AAAI Conference on Artificial Intelligence (AAAI), 2018. 3, 5, 6, 8
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, pages 448–456, 2015. 1, 3
    Google ScholarLocate open access versionFindings
  • M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC), 2014. 3
    Google ScholarLocate open access versionFindings
  • J. Kovacevic, A. Chebira, et al. An introduction to frames. Foundations and Trends in Signal Processing, 2(1):1–94, 2008. 4
    Google ScholarLocate open access versionFindings
  • J. Krause, M. Stark, J. Deng, and L. Fei-Fei. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013. 12
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Canada, 2009. 5
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1097–1105, 2012. 1
    Google ScholarLocate open access versionFindings
  • Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng. Ica with reconstruction cost for efficient overcomplete feature learning. In Advances in Neural Information Processing Systems (NIPS), pages 1017–1025, 2011. 5, 12
    Google ScholarLocate open access versionFindings
  • S. Li, S. Bak, P. Carr, and X. Wang. Diversity regularized spatiotemporal attention for video-based person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 369–378, 2018. 3
    Google ScholarLocate open access versionFindings
  • D. Mishkin and J. Matas. All you need is a good init. In ICLR, 2016. 3
    Google ScholarLocate open access versionFindings
  • T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In ICLR, 2018. 2, 3, 7
    Google ScholarLocate open access versionFindings
  • Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision, pages 360–368, 2017. 12
    Google ScholarLocate open access versionFindings
  • G. Ostrovski, W. Dabney, and R. Munos. Autoregressive quantile networks for generative modeling. In Proceedings of the International Conference on Machine Learning (ICML), pages 3936–3945, 2018. 7
    Google ScholarLocate open access versionFindings
  • M. Ozay and T. Okatani. Optimization on submanifolds of convolution kernels in cnns. arXiv preprint arXiv:1610.07008, 2016. 3
    Findings
  • V. Papyan, Y. Romano, J. Sulam, and M. Elad. Convolutional dictionary learning via local processing. In ICCV, 2017. 7
    Google ScholarLocate open access versionFindings
  • R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In ICML, pages 1310– 1318, 2013. 3
    Google ScholarLocate open access versionFindings
  • P. Rodrıguez, J. Gonzalez, G. Cucurull, J. M. Gonfaus, and F. X. Roca. Regularizing cnns with locally constrained decorrelations. In ICLR, 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 901–909, 2016. 3
    Google ScholarLocate open access versionFindings
  • A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR, 2014. 3
    Google ScholarLocate open access versionFindings
  • H. Sedghi, V. Gupta, and P. M. Long. The singular values of convolutional layers. In ICLR, 2019. 3
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1
    Google ScholarLocate open access versionFindings
  • J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In ICLR (workshop track), 2015. 8
    Google ScholarLocate open access versionFindings
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 6, 7
    Google ScholarLocate open access versionFindings
  • A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems (NIPS), pages 4790–4798, 2016. 7
    Google ScholarLocate open access versionFindings
  • E. Vorontsov, C. Trabelsi, S. Kadoury, and C. Pal. On orthogonality and learning recurrent networks with long term dependencies. In ICML, pages 3570–3578, 2017. 3
    Google ScholarLocate open access versionFindings
  • P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010. 6
    Google ScholarFindings
  • S. Wisdom, T. Powers, J. Hershey, J. Le Roux, and L. Atlas. Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 4880–4888, 2016. 3
    Google ScholarLocate open access versionFindings
  • D. Xie, J. Xiong, and S. Pu. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6176–6185, 2017. 2, 3, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • K. Yanai, R. Tanno, and K. Okamoto. Efficient mobile implementation of a cnn-based object recognition system. In Proceedings of the International Conference on Multimedia (ICM), pages 362–366, 2016. 2
    Google ScholarLocate open access versionFindings
  • S. Zagoruyko and N. Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016. 5
    Google ScholarLocate open access versionFindings
  • H. Zheng, J. Fu, T. Mei, and J. Luo. Learning multi-attention convolutional neural network for fine-grained image recognition. In ICCV, pages 5209–5217, 2017. 3
    Google ScholarLocate open access versionFindings
  • J. Zhou, M. N. Do, and J. Kovacevic. Special paraunitary matrices, cayley transform, and multidimensional orthogonal filter banks. IEEE Transactions on Image Processing, 15(2):511–519, 2006. 2
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments