Video Editing via Factorized Diffusion Distillation
arxiv(2024)
摘要
We introduce Emu Video Edit (EVE), a model that establishes a new
state-of-the art in video editing without relying on any supervised video
editing data. To develop EVE we separately train an image editing adapter and a
video generation adapter, and attach both to the same text-to-image model.
Then, to align the adapters towards video editing we introduce a new
unsupervised distillation procedure, Factorized Diffusion Distillation. This
procedure distills knowledge from one or more teachers simultaneously, without
any supervised data. We utilize this procedure to teach EVE to edit videos by
jointly distilling knowledge to (i) precisely edit each individual frame from
the image editing adapter, and (ii) ensure temporal consistency among the
edited frames using the video generation adapter. Finally, to demonstrate the
potential of our approach in unlocking other capabilities, we align additional
combinations of adapters
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要