Semi-supervised Learning for Few-shot Image-to-Image Translation

CVPR, pp. 4452-4461, 2020.

Cited by: 1|Bibtex|Views28|Links
EI
Keywords:
unpaired imagealexei a efrosunsupervised imagei2i translationFrechet Inception DistanceMore(16+)
Weibo:
We employ a cycle consistency constraint to exploit the information in unlabeled data, as well as several generic modifications to make the I2I translation task easier

Abstract:

In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled...More

Code:

Data:

0
Introduction
  • Image-to-image (I2I) translations are an integral part of many computer vision tasks.
  • The scalability problem has been successfully alleviated [9, 35, 36, 41], enabling translations across several domains using a single model
  • These approaches still suffer from two issues.
  • The target domain is required to contain the same categories or attributes as the source domain at test time, failing to scale to unseen categories (see Fig. 1(a))
  • They highly rely upon having access to vast quantities of labeled data (Fig. 1(a, b)) at train time.
  • Such labels provide useful information during the training process and play a key role in some settings
Highlights
  • Image-to-image (I2I) translations are an integral part of many computer vision tasks
  • In order to further leverage the unlabeled images from the dataset, we use a cycle consistency constraint [48]. Such a cycle constraint has generally been used to guarantee the content preservation in unpaired I2I translation [22, 46, 48, 28], but we propose here using it to exploit the information contained in unlabeled images
  • Using the pseudo-labels provided by Noise-tolerant Pseudo-Labeling, we describe the actual training of the I2I translation model
  • We measure translation accuracy by the Top1 and Top5 accuracies of two classifiers: all and test. The former is trained on both source and target classes, while the latter is trained using only target classes
  • We proposed semi-supervised learning to perform fewshot unpaired I2I translation with fewer image labels for the
  • We employ a cycle consistency constraint to exploit the information in unlabeled data, as well as several generic modifications to make the I2I translation task easier
Methods
  • Method overview

    As illustrated in Fig. 2 (a), the model architecture consists of six sub-networks: Pose encoder Pφ, Appearance encoder Aη, Generator GΦ, Multilayer perceptron Mω, feature regulator F , and Discriminator Dξ, where indices denote the parameters of each subnet.
  • Let xsc ∈ X be the input source image which provides pose information, and xtg ∈ X the target image which contributes appearance, with corresponding labels lsc ∈ {1, .
  • The appearance information Aη is mapped to the input parameters of the Adaptive Instance Normalization (AdaIN) layers [18] by the multilayer perceptron Mω.
  • The authors expect GΦ to output a target-like image in terms of appearance, which should be classified as the corresponding label ltg
Results
  • Evaluation metrics

    The authors consider the following three metrics. Among them, two are commonly used Inception Score (IS) [38] and Frechet Inception Distance (FID) [17].
  • The authors use Translation Accuracy [28] to evaluate whether a model is able to generate images of the target class.
  • The authors measure translation accuracy by the Top1 and Top5 accuracies of two classifiers: all and test.
  • The former is trained on both source and target classes, while the latter is trained using only target classes.
  • The authors compare against the following baselines (see Suppl.
  • FUNIT [28] is the first fewshot I2I translation method
Conclusion
  • The authors proposed semi-supervised learning to perform fewshot unpaired I2I translation with fewer image labels for the App. source domain.
  • The authors employ a cycle consistency constraint to exploit the information in unlabeled data, as well as several generic modifications to make the I2I translation task easier.
  • The authors' method achieves excellent results on several datasets while requiring only a fraction of the labels
Summary
  • Introduction:

    Image-to-image (I2I) translations are an integral part of many computer vision tasks.
  • The scalability problem has been successfully alleviated [9, 35, 36, 41], enabling translations across several domains using a single model
  • These approaches still suffer from two issues.
  • The target domain is required to contain the same categories or attributes as the source domain at test time, failing to scale to unseen categories (see Fig. 1(a))
  • They highly rely upon having access to vast quantities of labeled data (Fig. 1(a, b)) at train time.
  • Such labels provide useful information during the training process and play a key role in some settings
  • Methods:

    Method overview

    As illustrated in Fig. 2 (a), the model architecture consists of six sub-networks: Pose encoder Pφ, Appearance encoder Aη, Generator GΦ, Multilayer perceptron Mω, feature regulator F , and Discriminator Dξ, where indices denote the parameters of each subnet.
  • Let xsc ∈ X be the input source image which provides pose information, and xtg ∈ X the target image which contributes appearance, with corresponding labels lsc ∈ {1, .
  • The appearance information Aη is mapped to the input parameters of the Adaptive Instance Normalization (AdaIN) layers [18] by the multilayer perceptron Mω.
  • The authors expect GΦ to output a target-like image in terms of appearance, which should be classified as the corresponding label ltg
  • Results:

    Evaluation metrics

    The authors consider the following three metrics. Among them, two are commonly used Inception Score (IS) [38] and Frechet Inception Distance (FID) [17].
  • The authors use Translation Accuracy [28] to evaluate whether a model is able to generate images of the target class.
  • The authors measure translation accuracy by the Top1 and Top5 accuracies of two classifiers: all and test.
  • The former is trained on both source and target classes, while the latter is trained using only target classes.
  • The authors compare against the following baselines (see Suppl.
  • FUNIT [28] is the first fewshot I2I translation method
  • Conclusion:

    The authors proposed semi-supervised learning to perform fewshot unpaired I2I translation with fewer image labels for the App. source domain.
  • The authors employ a cycle consistency constraint to exploit the information in unlabeled data, as well as several generic modifications to make the I2I translation task easier.
  • The authors' method achieves excellent results on several datasets while requiring only a fraction of the labels
Tables
  • Table1: Datasets used in the experiments imize it; (b) classification loss that ensures that sub-nets {Pφ, Aη, Mω, GΦ} map source images xsc to target-like images; (c) entropy regularization loss that enforces the pose feature to be class-invariant; and (d) reconstruction loss that strengthens the connection between the translated images and the target image xtg, and guarantees the translated images reserve the pose of the input source image xsc
  • Table2: Performance comparison with baselines on Animals [<a class="ref-link" id="c28" href="#r28">28</a>]
  • Table3: Performance comparison with baselines on Birds [<a class="ref-link" id="c40" href="#r40">40</a>]
  • Table4: Table 4
  • Table5: Table 5
  • Table6: Performance comparison with baselines on Animals++
  • Table7: Performance comparison with baselines on Birds++
Download tables as Excel
Related work
  • Semi-supervised learning. The methods in this category employ a small set of labeled images and a large set of unlabeled data to learn a general data representation. Several works have explored applying semi-supervised learning to Generative Adversarial Networks (GANs). For example, [32, 38] merge the discriminator and classifier into a single network. The generated samples are used as unlabeled samples to train the ladder network [32]. Springenberg [39] explored training a classifier in a semi-supervised, adversarial manner. Similarly, Li et al [10] proposed Triple-GAN that plays minimax game with a generator, a discriminator and a classifier. Other works [11, 12] either learn two-way conditional distributions of both the labels and the images, or add a new network to predict missing labels. Recently, Lucic et al [29] proposed bottom-up and top-down methods to generate high resolution images with fewer labels. To the best of our knowledge, no previous work addresses I2I translation to generate highly realistic images in a semi-supervised manner.
Funding
  • Proposes applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure
  • Proposes using semi-supervised learning to reduce the requirement of labeled source images and effectively use unlabeled data
  • Uses a cycle consistency constraint . Such a cycle constraint has generally been used to guarantee the content preservation in unpaired I2I translation , but proposes using it to exploit the information contained in unlabeled images
  • Proposes a novel application of OctConv for I2I translation, making us the first to use it for a generative task
  • Proposes several crucial modifications to facilitate this challenging setting
Reference
  • Yazeed Alharbi, Neil Smith, and Peter Wonka. Latent filter scaling for multimodal unsupervised image-to-image translation. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Matthew Amodio and Smita Krishnaswamy. Travelgan: Image-to-image translation by transformation vector learning. In CVPR, June 2019.
    Google ScholarLocate open access versionFindings
  • Sagie Benaim and Lior Wolf. One-shot unsupervised cross domain translation. In NIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pages 2172–2180, 2016.
    Google ScholarLocate open access versionFindings
  • Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. Destruction and construction learning for fine-grained image recognition. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Yunpeng Chen, Haoqi Fang, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, and Jiashi Feng. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. arXiv preprint arXiv:1904.05049, 2019.
    Findings
  • Ying-Cong Chen, Xiaogang Xu, Zhuotao Tian, and Jiaya Jia. Homomorphic latent space interpolation for unpaired imageto-image translation. In CVPR, pages 2408–2416, 2019.
    Google ScholarLocate open access versionFindings
  • Wonwoong Cho, Sungha Choi, David Keetae Park, Inkyu Shin, and Jaegul Choo. Image-to-image translation via group-wise deep whitening-and-coloring transformation. In CVPR, June 2019.
    Google ScholarLocate open access versionFindings
  • Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, June 2018.
    Google ScholarLocate open access versionFindings
  • LI Chongxuan, Taufik Xu, Jun Zhu, and Bo Zhang. Triple generative adversarial nets. In NIPS, pages 4088–4098, 2017.
    Google ScholarLocate open access versionFindings
  • Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, and Eric P Xing. Structured generative adversarial networks. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Zhe Gan, Liqun Chen, Weiyao Wang, Yuchen Pu, Yizhe Zhang, Hao Liu, Chunyuan Li, and Lawrence Carin. Triangle generative adversarial networks. In Advances in Neural Information Processing Systems, pages 5247–5256, 2017.
    Google ScholarLocate open access versionFindings
  • Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In CVPR, pages 2414–2423, 2016.
    Google ScholarLocate open access versionFindings
  • Weifeng Ge, Xiangru Lin, and Yizhou Yu. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In CVPR, pages 3034– 3043, 2019.
    Google ScholarLocate open access versionFindings
  • Abel Gonzalez-Garcia, Joost van de Weijer, and Yoshua Bengio. Image-to-image translation for cross-domain disentanglement. In NIPS, pages 1294–1305, 2018.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, pages 6626–6637, 2017.
    Google ScholarLocate open access versionFindings
  • Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised image-to-image translation. In ECCV, pages 172–189, 2018.
    Google ScholarLocate open access versionFindings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Yoshiyuki Kawano and Keiji Yanai. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In ECCV, pages 3–17.
    Google ScholarLocate open access versionFindings
  • Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jungkwon Lee, and Jiwon Kim. Learning to discover cross-domain relations with generative adversarial networks. ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Yi Kun and Wu Jianxin. Probabilistic End-to-end Noise Correction for Learning with Noisy Labels. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Michael Lam, Behrooz Mahasseni, and Sinisa Todorovic. Fine-grained recognition as hsnet search for informative image parts. In CVPR, July 2017.
    Google ScholarLocate open access versionFindings
  • Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, and Ming-Hsuan Yang. Diverse imageto-image translation via disentangled representations. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Jianxin Lin, Yingce Xia, Sen Liu, Tao Qin, and Zhibo Chen. Zstgan: An adversarial approach for unsupervised zero-shot image-to-image translation. arXiv preprint arXiv:1906.00184, 2019.
    Findings
  • Fayao Liu, Chunhua Shen, Guosheng Lin, and Ian Reid. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. on PAMI, 38(10):2024–2039, 2016.
    Google ScholarLocate open access versionFindings
  • Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, pages 10551–10560, 2019.
    Google ScholarLocate open access versionFindings
  • Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, and Sylvain Gelly. High-fidelity image generation with fewer labels. ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In ICVGIP, pages 722–729. IEEE, 2008.
    Google ScholarLocate open access versionFindings
  • Augustus Odena. Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583, 2016.
    Findings
  • Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. In ICML, pages 2642–2651. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
    Google ScholarFindings
  • Guim Perarnau, Joost Van De Weijer, Bogdan Raducanu, and Jose M Alvarez. Invertible conditional gans for image editing. Advances in neural information processing systems Workshop on Adversarial Training, 2016.
    Google ScholarLocate open access versionFindings
  • Andres Romero, Pablo Arbelaez, Luc Van Gool, and Radu Timofte. Smit: Stochastic multi-label image-to-image translation. arXiv preprint arXiv:1812.03704, 2019.
    Findings
  • Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada. Asymmetric tri-training for unsupervised domain adaptation. ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NIPS, pages 2234–2242, 2016.
    Google ScholarLocate open access versionFindings
  • Jost Tobias Springenberg. Unsupervised and semisupervised learning with categorical generative adversarial networks. ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In CVPR, pages 595–604, 2015.
    Google ScholarLocate open access versionFindings
  • Yaxing Wang, Abel Gonzalez-Garcia, Joost van de Weijer, and Luis Herranz. SDIT: Scalable and diverse cross-domain image translation. In ACM MM, 2019.
    Google ScholarLocate open access versionFindings
  • Yaxing Wang, Joost van de Weijer, and Luis Herranz. Mix and match networks: encoder-decoder alignment for zeropair image translation. In CVPR, pages 5467–5476, 2018.
    Google ScholarLocate open access versionFindings
  • Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. Caltech-ucsd birds 200. 2010.
    Google ScholarFindings
  • Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, and Chen Change Loy. Transgaga: Geometry-aware unsupervised image-to-image translation. In CVPR, June 2019.
    Google ScholarLocate open access versionFindings
  • Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 420–435, 2018.
    Google ScholarLocate open access versionFindings
  • Zili Yi, Hao Zhang, Ping Tan Gong, et al. Dualgan: Unsupervised dual learning for image-to-image translation. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Weiwei Zhang, Jian Sun, and Xiaoou Tang. Cat head detection-how to effectively exploit shape and texture features. In ECCV, pages 802–816.
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. Toward multimodal image-to-image translation. In NIPS, pages 465–476, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments