Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
The excellent text-to-image synthesis capability of diffusion models has
driven progress in synthesizing coherent visual stories. The current
state-of-the-art method combines the features of historical captions,
historical frames, and the current captions as conditions for generating the
current frame. However, this method treats each historical frame and caption as
the same contribution. It connects them in order with equal weights, ignoring
that not all historical conditions are associated with the generation of the
current frame. To address this issue, we propose Causal-Story. This model
incorporates a local causal attention mechanism that considers the causal
relationship between previous captions, frames, and current captions. By
assigning weights based on this relationship, Causal-Story generates the
current frame, thereby improving the global consistency of story generation. We
evaluated our model on the PororoSV and FlintstonesSV datasets and obtained
state-of-the-art FID scores, and the generated frames also demonstrate better
storytelling in visuals.
更多查看译文
关键词
Training,Image synthesis,Diffusion model,Story visualization,Multi-modalities
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要