FaceCLIP: Facial Image-to-Video Translation via A Brief Text Description

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 0|浏览1
暂无评分
摘要
The existing image-to-video translation methods generally follow a frame-by-frame generative paradigm, while extracting the temporal information from a reference video or an audio stream. Inspired by the recent success in text-guided image generation, we explore a more challenging but promising task, Text-guided Image-to-Video (TI2V) translation. Given an image and a brief text description as input, TI2V aims to generate a facial expression video following the image and text. To this end, we first propose an automatic video captioning pipeline to generate dense textual descriptions for facial video datasets, using both expression labels and action units. These dense textual descriptions provide precise semantic guidance for TI2V learning. Then we design and train an efficient framework, FaceCLIP, on these datasets to deal with the TI2V translation task. FaceCLIP adopts a video autoencoder to model the temporal information of training videos, and a pretrained CLIP model to embed the video frames and the text description. We design a reconstruction loss and an embedding alignment loss to train the autoencoder to obtain the text-guided video generative ability. Recognizing that expressions are closely tied to facial landmark motions, the reconstruction loss is applied to facial landmarks rather than each video frame, significantly enhancing training efficiency. We compare FaceCLIP with several potential baseline methods, and extensively evaluate the performance using multiple metrics. Both qualitative and quantitative results validate the superiority of FaceCLIP in terms of both visual quality and expression-text consistency. Moreover, the unique ability of FaceCLIP to generate videos based on abstract texts demonstrates its stronger generalization capability.
更多
查看译文
关键词
autoencoder,CLIP,facial video generation,image-to-video translation,text-guided,transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要