Towards Open-Ended Text-to-Face Generation, Combination and Manipulation

International Multimedia Conference(2022)

引用 3|浏览64
暂无评分
摘要
ABSTRACTText-to-face (T2F) generation is an emerging research hot spot in multimedia, and its main challenge lies in the high fidelity requirement of generated portraits. Many existing works resort to exploring the latent space in a pre-trained generator, e.g., StyleGAN, which has obvious shortcomings in efficiency and generalization ability. In this paper, we propose a generative network for open-ended text-to-face generation, which is termed OpenFaceGAN. Differing from existing StyleGAN-based methods, OpenFaceGAN constructs an effective multi-modal latent space that directly converts the natural language description into a face. This mapping paradigm can fit the real data distribution well and make the model capable of open-ended and even zero-shot T2F generation. Our method improves the inference speed by an order of magnitude, e.g., 294 times than TediGAN. Based on OpenFaceGAN, we further explore text-guided face manipulation (editing). In particular, we propose a parameterized module, OpenEditor, to automatically disentangle the target latent code and update the original style information. OpenEditor also makes OpenFaceGAN directly applicable for most manipulation instructions without example-dependent searches or optimizations, greatly improving the efficiency of face manipulation. We conduct extensive experiments on two benchmark datasets namely Multi-Modal CelebA-HQ and Face2Text-v1.0. The experimental results not only show the superior performance of OpenFaceGAN to the existing T2F methods in both image quality and image-text matching degree but also greatly confirm its outstanding ability in the zero-shot generation. Codes will be released at: \textcolormagenta \urlhttps://github.com/pengjunn/OpenFace
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要