Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
CoRR(2024)
摘要
Image customization has been extensively studied in text-to-image (T2I)
diffusion models, leading to impressive outcomes and applications. With the
emergence of text-to-video (T2V) diffusion models, its temporal counterpart,
motion customization, has not yet been well investigated. To address the
challenge of one-shot motion customization, we propose Customize-A-Video that
models the motion from a single reference video and adapting it to new subjects
and scenes with both spatial and temporal varieties. It leverages low-rank
adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V
diffusion model for specific motion modeling from the reference videos. To
disentangle the spatial and temporal information during the training pipeline,
we introduce a novel concept of appearance absorbers that detach the original
appearance from the single reference video prior to motion learning. Our
proposed method can be easily extended to various downstream tasks, including
custom video generation and editing, video appearance customization, and
multiple motion combination, in a plug-and-play fashion. Our project page can
be found at https://anonymous-314.github.io.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要