Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
CoRR(2024)
摘要
Text-to-video generation marks a significant frontier in the rapidly evolving
domain of generative AI, integrating advancements in text-to-image synthesis,
video captioning, and text-guided editing. This survey critically examines the
progression of text-to-video technologies, focusing on the shift from
traditional generative models to the cutting-edge Sora model, highlighting
developments in scalability and generalizability. Distinguishing our analysis
from prior works, we offer an in-depth exploration of the technological
frameworks and evolutionary pathways of these models. Additionally, we delve
into practical applications and address ethical and technological challenges
such as the inability to perform multiple entity handling, comprehend
causal-effect learning, understand physical interaction, perceive object
scaling and proportioning, and combat object hallucination which is also a
long-standing problem in generative models. Our comprehensive discussion covers
the topic of enablement of text-to-video generation models as human-assistive
tools and world models, as well as eliciting model's shortcomings and
summarizing future improvement direction that mainly centers around training
datasets and evaluation metrics (both automatic and human-centered). Aimed at
both newcomers and seasoned researchers, this survey seeks to catalyze further
innovation and discussion in the growing field of text-to-video generation,
paving the way for more reliable and practical generative artificial
intelligence technologies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要