VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
CoRR(2024)
Abstract
The arrival of Sora marks a new era for text-to-video diffusion models,
bringing significant advancements in video generation and potential
applications. However, Sora, along with other text-to-video diffusion models,
is highly reliant on prompts, and there is no publicly available dataset that
features a study of text-to-video prompts. In this paper, we introduce VidProM,
the first large-scale dataset comprising 1.67 Million unique text-to-Video
Prompts from real users. Additionally, this dataset includes 6.69 million
videos generated by four state-of-the-art diffusion models, alongside some
related data. We initially discuss the curation of this large-scale dataset, a
process that is both time-consuming and costly. Subsequently, we underscore the
need for a new prompt dataset specifically designed for text-to-video
generation by illustrating how VidProM differs from DiffusionDB, a large-scale
prompt-gallery dataset for image generation. Our extensive and diverse dataset
also opens up many exciting new research areas. For instance, we suggest
exploring text-to-video prompt engineering, efficient video generation, and
video copy detection for diffusion models to develop better, more efficient,
and safer models. The project (including the collected dataset VidProM and
related code) is publicly available at https://vidprom.github.io under the
CC-BY-NC 4.0 License.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined