DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Bing Yang, Xueqin Xiang, Wangzeng Kong,Jianhai Zhang,Yong Peng

IEEE Transactions on Multimedia(2024)

引用 0|浏览2
暂无评分
摘要
Text-to-image synthesis aims to generate highquality realistic images conditioned on text description. The great challenge of this task depends on deeply and seamlessly integrating image and text information. Thus, in this paper, we propose a deep multimodal fusion generative adversarial networks (DMF-GAN) that allows effective semantic interactions for finegrained text-to-image generation. Specifically, through a novel recurrent semantic fusion network, DMF-GAN could consistently manipulate global assignment of text information among isolated fusion blocks. With the assistance of a multi-head attention module, DMF-GAN could model word information from different perspectives and further improve the semantic consistency. In addition, a word-level discriminator is proposed to provide the generator with fine-grained feedback related to each word. Compared with current state-of-the-art methods, our proposed DMFGAN could efficiently synthesize realistic and text-alignment images and achieve better performance on challenging benchmarks. The code link: https://github.com/xueqinxiang/DMF-GAN
更多
查看译文
关键词
Deep multimodal fusion,generative adversarial network,text-to-image (T2I) synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要