Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

Dimakis Alexandros G.
Dimakis Alexandros G.

CVPR, pp. 14519-14527, 2019.

Cited by: 10|Bibtex|Views21|Links
EI
Keywords:
TensorFlow Research Cloudgenerative adversarial networksnatural imageLeft to Rightdimensional geometryMore(10+)
Weibo:
We introduced a new type of local sparse attention layer designed for two-dimensional data

Abstract:

We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from $18.65$ to $15.94$ on ImageNet, keeping al...More

Code:

Data:

0
Introduction
  • Generative Adversarial Networks [10] are making significant progress on modeling and generating natural images [26, 4].
  • The central limitation is that convolutions fail to model complex geometries and long-distance dependencies– the canonical example is generating dogs with fewer or more than four legs.
  • To compensate for this limitation, attention layers [25] have been introduced in deep generative models [26, 4].
Highlights
  • Generative Adversarial Networks [10] are making significant progress on modeling and generating natural images [26, 4]
  • We introduce a new local sparse attention layer that preserves two-dimensional image locality and can support good information flow through attention steps
  • To the significantly improved scores, one important benefit of using YLG sparse layer instead of a dense attention layer, is that we observe significant reduction of the training time needed for the model to reach it’s optimal performance
  • We introduced a new type of local sparse attention layer designed for two-dimensional data
  • An interesting future direction is the design of attention layers, thought of as multi-step networks
  • We introduced information flow graphs as a mathematical abstraction and proposed full information as a desired criterion for such attention networks
Methods
  • Experiments setup

    In this subsection, the authors will briefly describe the experimental setup for the inversion technique.
  • The authors choose to use the recently introduced Lookahead [28] optimizer as the authors find that it reduces the number of different seeds the authors have to try for a successful inversion.
  • For the vast majority of the examined real images, the authors are able to get a satisfying inversion by trying at most 4 different seeds.
  • On a single V100 GPU, a single image inversion takes less than a minute to complete.
  • The authors choose to invert real-world images that were not present in the training set.
  • The authors initialize the latent variables from a truncated normal distribution, as explained in 9.2
Results
  • As shown in Table 1, YLG-SAGAN (3rd row) outperforms SAGAN by a large margin measured by both FID and Inception score.
  • YLG-SAGAN increases Inception score to 57.22 (8.95% improvement) and improves FID to 15.94 (14.53% improvement).
  • To the significantly improved scores, one important benefit of using YLG sparse layer instead of a dense attention layer, is that the authors observe significant reduction of the training time needed for the model to reach it’s optimal performance.
  • Figure 4 illustrates SAGAN and YLG-SAGAN FID and Inception score as a function of the training time
Conclusion
  • Conclusions and Future

    Work

    The authors introduced a new type of local sparse attention layer designed for two-dimensional data.
  • The authors' technique uses the discriminator in two ways: First, using its attention to obtain pixel importance and second, as a smoothing representation of the inversion loss landscape
  • This new inversion method allowed them to visualize the network on approximations of real images and to test how good a generative model is in this important coverage task.
  • The authors believe that this is the first key step towards using generative models for inverse problems and the authors plan to explore this further in the future
Summary
  • Introduction:

    Generative Adversarial Networks [10] are making significant progress on modeling and generating natural images [26, 4].
  • The central limitation is that convolutions fail to model complex geometries and long-distance dependencies– the canonical example is generating dogs with fewer or more than four legs.
  • To compensate for this limitation, attention layers [25] have been introduced in deep generative models [26, 4].
  • Methods:

    Experiments setup

    In this subsection, the authors will briefly describe the experimental setup for the inversion technique.
  • The authors choose to use the recently introduced Lookahead [28] optimizer as the authors find that it reduces the number of different seeds the authors have to try for a successful inversion.
  • For the vast majority of the examined real images, the authors are able to get a satisfying inversion by trying at most 4 different seeds.
  • On a single V100 GPU, a single image inversion takes less than a minute to complete.
  • The authors choose to invert real-world images that were not present in the training set.
  • The authors initialize the latent variables from a truncated normal distribution, as explained in 9.2
  • Results:

    As shown in Table 1, YLG-SAGAN (3rd row) outperforms SAGAN by a large margin measured by both FID and Inception score.
  • YLG-SAGAN increases Inception score to 57.22 (8.95% improvement) and improves FID to 15.94 (14.53% improvement).
  • To the significantly improved scores, one important benefit of using YLG sparse layer instead of a dense attention layer, is that the authors observe significant reduction of the training time needed for the model to reach it’s optimal performance.
  • Figure 4 illustrates SAGAN and YLG-SAGAN FID and Inception score as a function of the training time
  • Conclusion:

    Conclusions and Future

    Work

    The authors introduced a new type of local sparse attention layer designed for two-dimensional data.
  • The authors' technique uses the discriminator in two ways: First, using its attention to obtain pixel importance and second, as a smoothing representation of the inversion loss landscape
  • This new inversion method allowed them to visualize the network on approximations of real images and to test how good a generative model is in this important coverage task.
  • The authors believe that this is the first key step towards using generative models for inverse problems and the authors plan to explore this further in the future
Tables
  • Table1: ImageNet Results
Download tables as Excel
Related work
  • There has been a flourishing of novel ideas on making attention mechanisms more efficient. Dai et al [7] separate inputs into chunks and associate a state vector with previous chunks of the input. Attention is performed per chunk, but information exchange between chunks is possible via the state vector. Guo et al [12] show that a star-shaped topology can reduce attention cost from O(n2) to O(n) in text sequences. Interestingly, this topology does have full information, under our framework. Sukhbaatar et al [24]
Funding
  • Introduces a new local sparse attention layer that preserves two-dimensional geometry and locality
  • Shows that by just replacing the dense attention layer of SAGAN with our construction, obtains very significant FID, Inception score and pure visual improvements
  • Proposes for our new layer are designed using a novel information theoretic criterion that uses information flow graphs
  • Presents a novel way to invert Generative Adversarial Networks with attention
  • Introduces a new local sparse attention layer that preserves two-dimensional image locality and can support good information flow through attention steps
  • Trains on ImageNet-128 and achieves 14.53% improvement to the FID score of SAGAN and 8.95% improvement in Inception score, by only changing the attention layer while maintaining all other parameters of the architecture
Reference
  • Rudolf Ahlswede, Ning Cai, S-YR Li, and Raymond W Yeung. Network information flow. IEEE Transactions on information theory, 46(4):1204–1216, 2000. 3
    Google ScholarLocate open access versionFindings
  • David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. Seeing What a GAN Cannot Generate. arXiv e-prints, page arXiv:1910.11626, Oct 2019. 14
    Findings
  • Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. In Pro-Figure 7: Upper Panel: YLG conditional image generation on different dog breeds from ImageNet dataset. From up to down: eskimo husky, siberian husky, saint bernard, maltese. Lower Panel: Random generated samples from YLG-SAGAN. Additional generated images are included in the Appendix. ceedings of the 34th International Conference on Machine Learning-Volume 70, pages 537–546. JMLR. org, 2017. 2, 7, 8, 10, 14
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv e-prints, page arXiv:1809.11096, Sep 2018. 1, 3, 11
    Findings
  • Dan A. Calian, Peter Roelants, Jacques Cali, Ben Carr, Krishna Dubba, John E. Reid, and Dell Zhang. SCRAM: Spatially Coherent Randomized Attention Maps. arXiv e-prints, page arXiv:1905.10308, May 2019. 7
    Findings
  • Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019. 1, 2, 3, 4, 6, 11, 13
    Findings
  • Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv e-prints, page arXiv:1901.02860, Jan 2019. 7
    Findings
  • Alexandros G Dimakis, P Brighten Godfrey, Yunnan Wu, Martin J Wainwright, and Kannan Ramchandran. Network coding for distributed storage systems. IEEE transactions on information theory, 56(9):4539–4551, 2010. 1, 3, 15
    Google ScholarLocate open access versionFindings
  • Jeff Donahue, Philipp Krahenbuhl, and Trevor Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016. 7
    Findings
  • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. arXiv eprints, page arXiv:1406.2661, Jun 2014. 1
    Findings
  • Scott Gray, Alec Radford, and Diederik P Kingma. Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 2017. 3
    Findings
  • Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. Star-Transformer. arXiv e-prints, page arXiv:1902.09113, Feb 2019. 3, 7, 15
    Findings
  • Paul Hand and Vladislav Voroninski. Global guarantees for enforcing deep generative priors by empirical risk. IEEE Transactions on Information Theory, 2019. 7
    Google ScholarLocate open access versionFindings
  • Maya Kabkab, Pouya Samangouei, and Rama Chellappa. Task-aware compressed sensing with generative adversarial networks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Tero Karras, Samuli Laine, and Timo Aila. A StyleBased Generator Architecture for Generative Adversarial Networks. arXiv e-prints, page arXiv:1812.04948, Dec 2018. 1
    Findings
  • Zachary C Lipton and Subarna Tripathi. Precise recovery of latent vectors from generative adversarial networks. arXiv preprint arXiv:1702.04782, 2017. 7
    Findings
  • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. arXiv preprint arXiv:1802.05751, 2018. 8
    Findings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv e-prints, page arXiv:1511.06434, Nov 2015. 1
    Findings
  • Ankit Raj, Yuqi Li, and Yoram Bresler. Gan-based projector for faster recovery with convergence guarantees in linear inverse problems. In Proceedings of the IEEE International Conference on Computer Vision, pages 5602–5611, 202, 7
    Google ScholarLocate open access versionFindings
  • JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and Aswin C Sankaranarayanan. One network to solve them all–solving linear inverse problems using deep projection models. In Proceedings of the IEEE International Conference on Computer Vision, pages 5888–5897, 2017. 2, 7
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 3
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints, page arXiv:1409.1556, Sep 2014. 14
    Findings
  • Ganlin Song, Zhou Fan, and John Lafferty. Surfing: Iterative optimization over incrementally trained deep networks. arXiv preprint arXiv:1907.08653, 2019. 7
    Findings
  • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. Adaptive Attention Span in Transformers. arXiv e-prints, page arXiv:1905.07799, May 2019. 1, 7
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. arXiv e-prints, page arXiv:1706.03762, Jun 2017. 1, 2, 5
    Findings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-Attention Generative Adversarial Networks. arXiv e-prints, page arXiv:1805.08318, May 2018. 1, 2, 3, 8, 11
    Findings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv e-prints, page arXiv:1612.03242, Dec 2016. 1
    Findings
  • Michael R. Zhang, James Lucas, Geoffrey Hinton, and Jimmy Ba. Lookahead Optimizer: k steps forward, 1 step back. arXiv e-prints, page arXiv:1907.08610, Jul 2019. 11
    Findings
Your rating :
0

 

Tags
Comments