Modality Disentangled Discriminator for Text-to-Image Synthesis

IEEE TRANSACTIONS ON MULTIMEDIA(2022)

引用 7|浏览87
暂无评分
摘要
Text-to-image (T2I) synthesis aims at generating photo-realistic images from text descriptions, which is a particularly important task in bridging vision and language. Each generated image consists of two parts: the content part related to the text and the style part irrelevant to the text. The existing discriminator does not distinguish between the content part and the style part. This not only precludes the T2I synthesis models from generating the content part effectively but also makes it difficult to manipulate the style of the generated image. In this paper, we propose a modality disentangled discriminator that distinguishes between the content part and the style part at a specific layer. Specifically, we enforce the early layers of a certain number in the discriminator to become the disentangled representation extractor through two losses. The extracted common representation for the content part can make the discriminator more effective for capturing the text-image correlation, while the extracted modality-specific representation for the style part can be directly transferred to other images. The combination of these two representations can also improve the quality of the generated images. Our proposed discriminator is used to substitute the discriminator of each stage in the representative model AttnGAN and the SOTA model DM-GAN. Extensive experiments are conducted on three widely used datasets, i.e. CUB, Oxford-102, and COCO, for the T2I synthesis task, demonstrating the superior performance of the modality disentangled discriminator over the base models. Code for DM-GAN with our modality disentangled discriminator is available at https://github.com/FangxiangFeng/DM-GAN-MDD.
更多
查看译文
关键词
Task analysis, Correlation, Image synthesis, Image reconstruction, Generative adversarial networks, Image representation, Visualization, text-to-image synthesis, generative adversarial networks, multi-modal disentangled representation learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要