Transformation GAN for Unsupervised Image Synthesis and Representation Learning

CVPR, pp. 469-478, 2020.

Cited by: 1|Bibtex|Views90|Links
EI
Keywords:
Transformation Generative Adversarial NetworksFrechet Inception Distanceimage synthesissupervised GANlarge scaleMore(14+)
Weibo:
We propose a novel generative model, namely, Transformation Generative Adversarial Network

Abstract:

Generative Adversarial Networks (GAN) have shown promising performance in image synthesis and unsupervised learning (USL). In most cases, however, the representations extracted from unsupervised GAN are usually unsatisfactory in other computer vision tasks. By using conditional GAN (CGAN), this problem could be solved to some extent, but ...More

Code:

Data:

0
Introduction
  • As a fundamental task in computer vision, representation learning has received lots of attention over the last decades.
  • The training methodology of deep neural networks is mainly driven by fully-supervised approaches with a large volume of labeled data.
  • With such a limitation, when only a limited amount of labeled data is available, it becomes a highly challenging problem to train DNN models effectively.
  • The conditional distributions p(t(x)|t) and p(t|t(x)) are critical for image transformation and transformation predicting, respectively
Highlights
  • As a fundamental task in computer vision, representation learning has received lots of attention over the last decades
  • Our Transformation Generative Adversarial Networks achieves a better performance over baseline conditional Generative Adversarial Networks with respect to Frechet Inception Distance [13]
  • We propose a novel generative model, namely, Transformation Generative Adversarial Network (TrGAN)
  • As a combination of self-supervised learning and Generative Adversarial Networks, Transformation Generative Adversarial Networks could cover the benefits of conditional Generative Adversarial Networks, such as stable training and visually sharper samples
  • To better utilize the meaningful features extracted by self-supervised learning, we introduce intermediate feature matching (IFM) methods to further guide the training of internal generator blocks
  • The experiments results in terms of both Frechet Inception Distance and representation quality demonstrate the effectiveness of our method
Methods
  • Method Roto

    Scat + SVM [26] ExamplarCNN [7] DCGAN [27] Scattering [25] RotNet + FC [9] AET-project + FC [39]

    Cond-GAN (Upper Bound) AET-only TrGAN

    The authors compare the representation quality of TrGAN to other state-of-the-art self-supervised learning algorithms on ImageNet.
  • Cond-GAN (Upper Bound) AET-only TrGAN.
  • As the experimental settings on CIFAR-10, the results of the fully supervised counterpart: Cond-GAN gives upper bounded performance.
  • As shown in Table 6, among all the baseline methods, the TrGAN outperforms Context [5], Colorization [40], BiGAN [6] and DeepCluster [4].
  • Context [5] Colorization [40] BiGAN [6] DeepCluster [4] AET-project [39] Cond-GAN (Upper Bound) AET-only TrGAN
Results
  • The authors' TrGAN achieves a better performance over baseline conditional GAN with respect to FID [13].
Conclusion
  • The authors propose a novel generative model, namely, Transformation Generative Adversarial Network (TrGAN).
  • As a combination of self-supervised learning and GAN, TrGAN could cover the benefits of conditional GAN, such as stable training and visually sharper samples.
  • To better utilize the meaningful features extracted by self-supervised learning, the authors introduce intermediate feature matching (IFM) methods to further guide the training of internal generator blocks.
  • The authors show that this unsupervised generative model can be trained to attain better FID even than its conditional counterpart.
  • The experiments results in terms of both FID and representation quality demonstrate the effectiveness of the method
Summary
  • Introduction:

    As a fundamental task in computer vision, representation learning has received lots of attention over the last decades.
  • The training methodology of deep neural networks is mainly driven by fully-supervised approaches with a large volume of labeled data.
  • With such a limitation, when only a limited amount of labeled data is available, it becomes a highly challenging problem to train DNN models effectively.
  • The conditional distributions p(t(x)|t) and p(t|t(x)) are critical for image transformation and transformation predicting, respectively
  • Methods:

    Method Roto

    Scat + SVM [26] ExamplarCNN [7] DCGAN [27] Scattering [25] RotNet + FC [9] AET-project + FC [39]

    Cond-GAN (Upper Bound) AET-only TrGAN

    The authors compare the representation quality of TrGAN to other state-of-the-art self-supervised learning algorithms on ImageNet.
  • Cond-GAN (Upper Bound) AET-only TrGAN.
  • As the experimental settings on CIFAR-10, the results of the fully supervised counterpart: Cond-GAN gives upper bounded performance.
  • As shown in Table 6, among all the baseline methods, the TrGAN outperforms Context [5], Colorization [40], BiGAN [6] and DeepCluster [4].
  • Context [5] Colorization [40] BiGAN [6] DeepCluster [4] AET-project [39] Cond-GAN (Upper Bound) AET-only TrGAN
  • Results:

    The authors' TrGAN achieves a better performance over baseline conditional GAN with respect to FID [13].
  • Conclusion:

    The authors propose a novel generative model, namely, Transformation Generative Adversarial Network (TrGAN).
  • As a combination of self-supervised learning and GAN, TrGAN could cover the benefits of conditional GAN, such as stable training and visually sharper samples.
  • To better utilize the meaningful features extracted by self-supervised learning, the authors introduce intermediate feature matching (IFM) methods to further guide the training of internal generator blocks.
  • The authors show that this unsupervised generative model can be trained to attain better FID even than its conditional counterpart.
  • The experiments results in terms of both FID and representation quality demonstrate the effectiveness of the method
Tables
  • Table1: Comparison between TrGAN and other baseline models on four datasets. The best Frechet Inception Distance (FID) are reported
  • Table2: Ablation studies on CIFAR-10 dataset. The best Frechet Inception Distance (FID) are reported
  • Table3: Comparison between TrGAN, AET-only and Cond-GAN on CIFAR-10. Top-1 classification accuracy are reported
  • Table4: Comparison with other unsupervised representation learning methods by top-1 accuracy on CIFAR-10
  • Table5: Comparison between TrGAN, AET-only and Cond-GAN on ImageNet. Top-1 classification accuracy are reported
  • Table6: Comparison with other unsupervised representation learning methods by top-1 accuracy on ImageNet
Download tables as Excel
Related work
  • Auto-Encoder. One of the most representative unsupervised learning methods is Auto-Encoder. During the training, the encoder is trained to output sufficient representations to reconstruct original images by the corresponding decoder. The common belief is that to reconstruct the input images, the extracted features should contain sufficient information. Many variants of Auto-Encoder [14, 16, 33, 34] have been proposed, in which the encoder acts as an unsupervised features extractor after being jointly trained with the decoder. For example, the variational auto-encoder [16], in which the distribution of features from the encoder is constrained to a prior distribution. In order to learn more robust representation, Denoising auto-encoder [33] is designed to reconstruct noise-corrupted data. Contrastive Auto-Encoder [29] aims to extract representation invariance to small perturbation. GAN. In recent years, GAN has gained significant popularity in image generation tasks, in practice, it deals with the generation tasks by approximating a proper mapping relation between data distribution and low-dimensional distribution. Specifically, a random noise z is fed into the generator G to obtain a sample G(z). And the discriminator D is required to distinguish between real samples and generated ones.
Funding
  • This work was supported in part to Prof
Reference
  • Pulkit Agrawal, J. Carreira, and Jitendra Malik. Learning to see by moving. In Proceedings of International Conference on Computer Vision (ICCV), 2015. 3
    Google ScholarLocate open access versionFindings
  • Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. Cvae-gan: Fine-grained image generation through asymmetric training. In Proceedings of International Conference on Computer Vision (ICCV), 2017. 3
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018. 6
    Findings
  • Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In European Conference on Computer Vision (ECCV), 2018. 8
    Google ScholarLocate open access versionFindings
  • Carl Doersch, Abhinav Gupta, and Alexei A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of International Conference on Computer Vision (ICCV), 2013, 8
    Google ScholarLocate open access versionFindings
  • Jeff Donahue, Philipp Krahenbuhl, and Trevor Darrell. Adversarial feature learning. In Proceedings of International Conference on Learning Representations (ICLR), 2017. 2, 8
    Google ScholarLocate open access versionFindings
  • Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas Brox. Discriminative unsupervised feature learning with exemplar convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2014. 8
    Google ScholarLocate open access versionFindings
  • Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, and Aaron Courville. Adversarially learned inference. In Proceedings of International Conference on Learning Representations (ICLR), 2017. 2
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In Proceedings of International Conference on Learning Representations (ICLR), 2018. 1, 3, 8
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), 2014. 1
    Google ScholarLocate open access versionFindings
  • Saurabh Gupta, Judy Hoffman, and Jitendra Malik. Cross modal distillation for supervision transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2827–2836, 2016. 1
    Google ScholarLocate open access versionFindings
  • Saurabh Gupta, Judy Hoffman, and Jitendra Malik. Cross modal distillation for supervision transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017. 6
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. Transforming auto-encoders. In International Conference on Artificial Neural Networks, 2011. 1, 2
    Google ScholarLocate open access versionFindings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of International Conference on Learning Representations (ICLR), 2018. 5
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of International Conference on Learning Representations (ICLR), 2014. 2
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. technical report, 2009. 5
    Google ScholarFindings
  • Christian Ledig, Lucas Theis, Ferenc Huszar, et al. Photorealistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 2
    Google ScholarLocate open access versionFindings
  • Chongxuan Li, Kun Xu, Jun Zhu, and Bo Zhang. Triple generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), 2017. 1, 3
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In Proceedings of International Conference on Learning Representations (ICLR), 2018. 6
    Google ScholarLocate open access versionFindings
  • Takeru Miyato and Masanori Koyama. cgans with projection discriminator. In Proceedings of International Conference on Learning Representations (ICLR), 2018. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • H. Su J. Krause S. Satheesh S. Ma Z. Huang A. Karpathy A. Khosla M. Bernstein A. C. Berg O. Russakovsky, J. Deng and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015. 5
    Google ScholarLocate open access versionFindings
  • Augustus Odena. Semi-supervised learning with generative adversarial networks. In Proceedings of International Conference on Learning Representations (ICLR), 2016. 3
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. In Proceedings of International Conference on Machine Learning (ICML), 2017. 1
    Google ScholarLocate open access versionFindings
  • Edouard Oyallon, Eugene Belilovsky, and Sergey Zagoruyko. Scaling the scattering transform: Deep hybrid networks. In Proceedings of International Conference on Computer Vision (ICCV), 2017. 8
    Google ScholarLocate open access versionFindings
  • Edouard Oyallon and Stephane Mallat. Deep roto-translation scattering for object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 8
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. Computer Science, 2015. 1, 2, 8
    Google ScholarLocate open access versionFindings
  • Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. 1
    Google ScholarLocate open access versionFindings
  • Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of International Conference on Machine Learning (ICML), 2011. 2
    Google ScholarLocate open access versionFindings
  • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In Proceedings of International Conference on Learning Representations (ICLR), 2015. 3
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems (NeurIPS), 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • Chen Ting, Zhai Xiaohua, Ritter Marvin, Lucic Mario, and Houlsby Neil. Self-supervised generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2, 3, 7
    Google ScholarLocate open access versionFindings
  • Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of International Conference on Machine Learning (ICML), 2008. 1, 2
    Google ScholarLocate open access versionFindings
  • Jiayu Wang, Wengang Zhou, Jinhui Tang, Zhongqian Fu, Qi Tian, and Houqiang Li. Unregularized auto-encoder with generative adversarial networks for image generation. In ACM International Conference on Multimedia (ACM MM), 2018. 2
    Google ScholarLocate open access versionFindings
  • Huang Xun, Yixuan Li, Omid Poursaeed, John Hopcroft, and Serge Belongie. Stacked generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 3, 4
    Google ScholarLocate open access versionFindings
  • Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision (ECCV), 2016. 1
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015. 5
    Findings
  • Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, and Augustus Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018. 6
    Findings
  • Liheng Zhang, Guo Jun Qi, Liqiang Wang, and Jiebo Luo. Aet vs. aed: Unsupervised representation learning by autoencoding transformations rather than data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2, 3, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • Richard Zhang, Phillip Isola, and Alexei A. Efros. Colorful image colorization. In European Conference on Computer Vision (ECCV), 2016. 1, 3, 8
    Google ScholarLocate open access versionFindings
  • Richard Zhang, Phillip Isola, and Alexei A. Efros. Splitbrain autoencoders: Unsupervised learning by cross-channel prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. In Proceedings of International Conference on Computer Vision (ICCV), 2017. 2
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments