Generative Networks for Synthesizing Human Videos in Text-Defined Outfits

2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)(2019)

引用 1|浏览14
暂无评分
摘要
Generating a video from a textual input is a challenging research topic that would have a variety of applications in industries such as retail, e-commerce, online entertainment, education etc. In this paper, we discuss the application of generating videos of a human subject in a desired outfit using an input video of the subject. We present a two stage solution, wherein at the first stage a generative model is learned such that, given the subject's image and a textual description of the outfit, a corresponding image of the subject in the described outfit is synthesized. At the second stage, all the frames of the subject's video are individually processed by the stage 1 model to generate corresponding frames and an optical flow based post processing step is performed to maintain visual coherence across the generated frames. Towards the stage-1 objective, multiple supervised and unsupervised convolutional neural network (CNN) based generative models have been proposed. A novel approach to inject an external masking layer that maintains the structural integrity of the generated images is also presented. We train and test the different methods on the publicly available multi-view clothing image data-set and the performance in videos is showcased on a set of real-world commercial videos. The experiments show the efficacy of our approach in generating images/videos in both low (64 × 64) and high (256 × 256) resolutions.
更多
查看译文
关键词
Generative adversarial network,Convolutional neural network,Text to video generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要