Pix2Gif: Motion-Guided Diffusion for GIF Generation
arxiv(2024)
摘要
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video)
generation. We tackle this problem differently by formulating the task as an
image translation problem steered by text and motion magnitude prompts, as
shown in teaser fig. To ensure that the model adheres to motion guidance, we
propose a new motion-guided warping module to spatially transform the features
of the source image conditioned on the two types of prompts. Furthermore, we
introduce a perceptual loss to ensure the transformed feature map remains
within the same space as the target image, ensuring content consistency and
coherence. In preparation for the model training, we meticulously curated data
by extracting coherent image frames from the TGIF video-caption dataset, which
provides rich information about the temporal changes of subjects. After
pretraining, we apply our model in a zero-shot manner to a number of video
datasets. Extensive qualitative and quantitative experiments demonstrate the
effectiveness of our model – it not only captures the semantic prompt from
text but also the spatial ones from motion guidance. We train all our models
using a single node of 16xV100 GPUs. Code, dataset and models are made public
at: https://hiteshk03.github.io/Pix2Gif/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要