Multi-Conditional Generative Adversarial Network for Text-to-Video Synthesis

Rui Zhou, Cong Jiang,Qingyang Xu, Yibin Li, Chengjin Zhang,Yong Song

Journal of Computer-Aided Design & Computer Graphics(2022)

引用 0|浏览1
暂无评分
摘要
Aiming at the problem that the current mainstream models have strong randomness in the text-to-video process and lack the ability to synthesize complex scenes and diverse motion videos, a text-to-video method based on multi-condition generative adversarial networks is proposed, including text processing modules, pose modeling and transition module, video frame generation and optimization module. The text processing module combines traditional generation methods (retrieval and supervised learning methods) with generative models to establish an action retrieval database to improve the controllability of the generation process, the pose modeling and transition module realizes the extraction of pose information and 3D modeling, the video frame generation and optimization module uses multi-condition generative adversarial networks to synthesize and optimize video frames. The model is verified by public datasets such as iPER and DeepFashion, and evaluated by IS, SSIM, PSNR and other indicators. Compared with the existing models, the semantic consistency and video quality of the video synthesis by the proposed model have advantages. Compared with the current mainstream posture transformation model such as MonkeyNet, the SSIM on the iPER dataset has increased by 16.8%, the IS has increased by 22.7%, and the PSNR value has increased by 27.1%. In terms of evaluating posture transition, in the baseline dataset DeepFashion, the FreID value increased by 26.7%.
更多
查看译文
关键词
synthesis,adversarial,multi-conditional,text-to-video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要