A Matrix-in-matrix Neural Network for Image Super Resolution

arXiv: Computer Vision and Pattern Recognition, 2019.

Cited by: 0|Bibtex|Views137|Links
EI
Keywords:
image super resolutionneural architecture searchneural networkmatrixed channel attention cellmulti-connected channel attention blockMore(15+)
Weibo:
The result confirms that MCANFAST has only a small loss of precision when compared to matrixed channel attention network, and it can still achieve better performance with fewer multi-adds and parameters than the state-of-the-art methods

Abstract:

In recent years, deep learning methods have achieved impressive results with higher peak signal-to-noise ratio in single image super-resolution (SISR) tasks by utilizing deeper layers. However, their application is quite limited since they require high computing power. In addition, most of the existing methods rarely take full advantage o...More

Code:

Data:

0
Introduction
  • Single image super-resolution (SISR) attempts to reconstruct a high-resolution (HR) image from its low-resolution (LR) equivalent, which is essentially an ill-posed inverse problem since there are infinitely many HR images that can be downsampled to the same LR image.

    Most of the works discussing SISR based on deep learning have been devoted to achieving higher peak signal noise ratios (PSNR) with deeper and deeper layers, making it difficult to fit in mobile devices [20, 21, 32, 41].
  • An architecture CARN has been released that is applicable in the mobile scenario, but it is at the cost of reduction on PSNR [1].
  • An effort that tackles SISR with neural architecture search has been proposed [5, 6], their network FALSR surpasses CARN at the same level of FLOPS.
Highlights
  • Single image super-resolution (SISR) attempts to reconstruct a high-resolution (HR) image from its low-resolution (LR) equivalent, which is essentially an ill-posed inverse problem since there are infinitely many HR images that can be downsampled to the same LR image.

    Most of the works discussing SISR based on deep learning have been devoted to achieving higher peak signal noise ratios (PSNR) with deeper and deeper layers, making it difficult to fit in mobile devices [20, 21, 32, 41]
  • We introduce a multi-connected channel attention block to construct matrixed channel attention cell (MCAC), which makes full use of the hierarchical features
  • We present matrixed channel attention network (MCAN)-FAST to overcome the inefficiency of the sigmoid function on some mobile devices
  • We proposed an accurate and efficient network with matrixed channel attention for the SISR task
  • The result confirms that MCANFAST has only a small loss of precision when compared to MCAN, and it can still achieve better performance with fewer multi-adds and parameters than the state-of-the-art methods
Methods
  • SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] CARN [1] CARN-M [1] MoreMNAS-A [6] FALSR-A [5] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+

    SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+

    SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] SRDenseNet [35] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+ Scale Train data

    G100+Yang91 G100+Yang91 G100+Yang91

    Yang91 G200+Yang91 G200+Yang91

    DIV2K G200+Yang91 ImageNet subset DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K

    Yang91 G200+Yang91

    DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K

    DIV2K G200+Yang91 ImageNet subset ImageNet subset.
  • SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] SRDenseNet [35] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+ Scale Train data.
  • DIV2K G200+Yang91 ImageNet subset DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K.
  • DIV2K G200+Yang91 ImageNet subset ImageNet subset
Results
  • Datasets and Evaluation Metrics.
  • The authors train the model based on DIV2K [34], which contains 800 2K high-resolution images for the training set and another 100 pictures for both validation and test set.
  • The authors make comparisons across three scaling tasks (×2, ×3, ×4) on four datasets: Set5 [2], Set14 [38], B100 [28], and Urban100 [17].
  • The authors crop the LR patches by 64 × 64 for various scale tasks and adopt the standard data augmentation
Conclusion
  • The authors proposed an accurate and efficient network with matrixed channel attention for the SISR task.
  • The authors release three additional efficient models of varied sizes, MCAN-M, MCAN-S, and MCAN-T.
  • Extensive experiments reveal that the MCAN family excel the state-of-the-art models of similar sizes or even much larger.
  • The result confirms that MCANFAST has only a small loss of precision when compared to MCAN, and it can still achieve better performance with fewer multi-adds and parameters than the state-of-the-art methods
Summary
  • Introduction:

    Single image super-resolution (SISR) attempts to reconstruct a high-resolution (HR) image from its low-resolution (LR) equivalent, which is essentially an ill-posed inverse problem since there are infinitely many HR images that can be downsampled to the same LR image.

    Most of the works discussing SISR based on deep learning have been devoted to achieving higher peak signal noise ratios (PSNR) with deeper and deeper layers, making it difficult to fit in mobile devices [20, 21, 32, 41].
  • An architecture CARN has been released that is applicable in the mobile scenario, but it is at the cost of reduction on PSNR [1].
  • An effort that tackles SISR with neural architecture search has been proposed [5, 6], their network FALSR surpasses CARN at the same level of FLOPS.
  • Methods:

    SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] CARN [1] CARN-M [1] MoreMNAS-A [6] FALSR-A [5] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+

    SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+

    SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] SRDenseNet [35] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+ Scale Train data

    G100+Yang91 G100+Yang91 G100+Yang91

    Yang91 G200+Yang91 G200+Yang91

    DIV2K G200+Yang91 ImageNet subset DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K

    Yang91 G200+Yang91

    DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K

    DIV2K G200+Yang91 ImageNet subset ImageNet subset.
  • SRCNN [7] FSRCNN [8] VDSR [20] DRCN [21] LapSRN [23] DRRN [32] BTSRN [9] MemNet [33] SelNet [4] SRDenseNet [35] CARN [1] CARN-M [1] MCAN MCAN+ MCAN-FAST MCAN-FAST+ MCAN-M MCAN-M+ MCAN-S MCAN-S+ MCAN-T MCAN-T+ Scale Train data.
  • DIV2K G200+Yang91 ImageNet subset DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K.
  • DIV2K G200+Yang91 ImageNet subset ImageNet subset
  • Results:

    Datasets and Evaluation Metrics.
  • The authors train the model based on DIV2K [34], which contains 800 2K high-resolution images for the training set and another 100 pictures for both validation and test set.
  • The authors make comparisons across three scaling tasks (×2, ×3, ×4) on four datasets: Set5 [2], Set14 [38], B100 [28], and Urban100 [17].
  • The authors crop the LR patches by 64 × 64 for various scale tasks and adopt the standard data augmentation
  • Conclusion:

    The authors proposed an accurate and efficient network with matrixed channel attention for the SISR task.
  • The authors release three additional efficient models of varied sizes, MCAN-M, MCAN-S, and MCAN-T.
  • Extensive experiments reveal that the MCAN family excel the state-of-the-art models of similar sizes or even much larger.
  • The result confirms that MCANFAST has only a small loss of precision when compared to MCAN, and it can still achieve better performance with fewer multi-adds and parameters than the state-of-the-art methods
Tables
  • Table1: Quantitative comparison with the state-of-the-art methods based on ×2, ×3, ×4 SR with bicubic degradation model. Red/blue text: best/second-best
  • Table2: Network hyperparameters of our networks
  • Table3: Investigations of MIM and EFF. We record the best average PSNR(dB) values of Set5 & Set14 on ×4 SR task in 105 steps
Download tables as Excel
Related work
  • In recent years, deep learning has been applied to many areas of computer vision [11, 14, 26, 30, 39]. A pioneering work [7] has brought super-resolution into deep learning era, in which they proposed a simple three-layer convolutional neural network called SRCNN, where each layer sequentially deals with feature extraction, non-linear mapping, and reconstruction. The input of SRCNN, however, needs an extra bicubic interpolation which reduces highfrequency information and adds extra computation. Their later work FSRCNN [8] requires no interpolation and inserts a deconvolution layer for reconstruction, which learns an end-to-end mapping. Besides, shrinking and expanding layers are introduced to speed up computation, altogether rendering FSRCNN real-time on a generic CPU.

    Meantime, VDSR presented by [21] features a global residual learning to ease training for their very deep network. DRCN handles deep network recursively to share parameters [21]. DRRN builds two residual blocks in a recursive manner [32]. They all bear the aforementioned problem caused by interpolation. Furthermore, these very deep architectures undoubtedly require heavy computation.
Reference
  • N. Ahn, B. Kang, and K.-A. Sohn. Fast, accurate, and, lightweight super-resolution with cascading residual network. arXiv preprint arXiv:1803.08664, 2018. 1, 2, 4, 5, 6, 7
    Findings
  • M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. AlberiMorel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. 2017
    Google ScholarFindings
  • Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. ZelnikManor. The 2018 pirm challenge on perceptual image superresolution. In European Conference on Computer Vision, pages 334–355. Springer, 2018. 1
    Google ScholarLocate open access versionFindings
  • J.-S. Choi and M. Kim. A deep convolutional neural network with selection units for super-resolution. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1150–1156. IEEE, 2017. 5
    Google ScholarLocate open access versionFindings
  • X. Chu, B. Zhang, H. Ma, R. Xu, J. Li, and Q. Li. Fast, accurate and lightweight super-resolution with neural architecture search. arXiv preprint arXiv:1901.07261, 2019. 1, 2, 5, 7
    Findings
  • X. Chu, B. Zhang, R. Xu, and H. Ma. Multi-objective reinforced evolution in mobile neural architecture search. arXiv preprint arXiv:1901.01074, 2019. 1, 5, 7
    Findings
  • C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In European conference on computer vision, pages 184–199. Springer, 2014. 2, 5
    Google ScholarLocate open access versionFindings
  • C. Dong, C. C. Loy, and X. Tang. Accelerating the superresolution convolutional neural network. In European Conference on Computer Vision, pages 391–407. Springer, 2016. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Y. Fan, H. Shi, J. Yu, D. Liu, W. Han, H. Yu, Z. Wang, X. Wang, and T. S. Huang. Balanced two-stage residual networks for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 161–168, 2017. 5
    Google ScholarLocate open access versionFindings
  • G. Georgiou. Parallel Distributed Processing in the Complex Domain. PhD thesis, Tulane, 1992. 7
    Google ScholarFindings
  • R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015. 2
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
    Findings
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2
    Google ScholarLocate open access versionFindings
  • A. Hore and D. Ziou. Image quality metrics: Psnr vs. ssim. In 2010 20th International Conference on Pattern Recognition, pages 2366–2369. IEEE, 2010. 7
    Google ScholarLocate open access versionFindings
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    Findings
  • J.-B. Huang, A. Singh, and N. Ahuja. Single image superresolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2015. 7
    Google ScholarLocate open access versionFindings
  • Z. Hui, X. Wang, and X. Gao. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 723–731, 201
    Google ScholarLocate open access versionFindings
  • F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
    Findings
  • J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image superresolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016. 1, 2, 5, 6
    Google ScholarLocate open access versionFindings
  • J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016. 1, 2, 3, 5
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 7
    Findings
  • W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate superresolution. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, page 5, 2017. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint, 2017. 1
    Google ScholarFindings
  • B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 136–144, 2017. 7
    Google ScholarLocate open access versionFindings
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. 2
    Google ScholarLocate open access versionFindings
  • X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in neural information processing systems, pages 2802–2810, 2016. 3
    Google ScholarLocate open access versionFindings
  • D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In null, page 416. IEEE, 2001. 7
    Google ScholarLocate open access versionFindings
  • V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010. 4
    Google ScholarLocate open access versionFindings
  • H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520– 1528, 2015. 2
    Google ScholarLocate open access versionFindings
  • W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient subpixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016. 7
    Google ScholarLocate open access versionFindings
  • Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, page 5, 2017. 1, 2, 5
    Google ScholarLocate open access versionFindings
  • Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4539–4547, 2017. 5
    Google ScholarLocate open access versionFindings
  • R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang. Ntire 2017 challenge on single image superresolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 114–125, 2017. 7
    Google ScholarLocate open access versionFindings
  • T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, pages 4799– 4807, 2017. 2, 3, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In European Conference on Computer Vision, pages 63–79.
    Google ScholarLocate open access versionFindings
  • Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. 7
    Google ScholarLocate open access versionFindings
  • J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image superresolution via sparse representation. IEEE transactions on image processing, 19(11):2861–2873, 2010. 7
    Google ScholarLocate open access versionFindings
  • R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In European conference on computer vision, pages 649–666.
    Google ScholarLocate open access versionFindings
  • X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu. Image super-resolution using very deep residual channel attention networks. In European Conference on Computer Vision, pages 294–310. Springer, 2018. 1, 2, 4, 7
    Google ScholarLocate open access versionFindings
  • Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2472–2481, 2018. 4
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments