Intelligent Grimm – Open-ended Visual Storytelling via Latent Diffusion Models
CVPR 2024(2023)
摘要
Generative models have recently exhibited exceptional capabilities in
text-to-image generation, but still struggle to generate image sequences
coherently. In this work, we focus on a novel, yet challenging task of
generating a coherent image sequence based on a given storyline, denoted as
open-ended visual storytelling. We make the following three contributions: (i)
to fulfill the task of visual storytelling, we propose a learning-based
auto-regressive image generation model, termed as StoryGen, with a novel
vision-language context module, that enables to generate the current frame by
conditioning on the corresponding text prompt and preceding image-caption
pairs; (ii) to address the data shortage of visual storytelling, we collect
paired image-text sequences by sourcing from online videos and open-source
E-books, establishing processing pipeline for constructing a large-scale
dataset with diverse characters, storylines, and artistic styles, named
StorySalon; (iii) Quantitative experiments and human evaluations have validated
the superiority of our StoryGen, where we show StoryGen can generalize to
unseen characters without any optimization, and generate image sequences with
coherent content and consistent character. Code, dataset, and models are
available at https://haoningwu3639.github.io/StoryGen_Webpage/
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要