Unsupervised Multimodal Video-To-Video Translation Via Self-Supervised Learning

Liu Kangning,Gu Shuhang,Andrés Romero,Timofte Radu

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021)（2021）

引用 4|浏览84

暂无评分

摘要

Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods.

查看译文

关键词

unsupervised multimodal video-to-video translation,video-to-video translation methods,semantic information,video-level,video-to-video translation model,specialized encoder-decoder structure,inter-frame information,bidirectional recurrent neural network units,style-content decomposition mechanism,style consistent video translation results,modality flexible translation,video interpolation loss,spatio-temporal consistent translated videos

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要