Diff-BGM: A Diffusion Model for Video Background Music Generation
CVPR 2024(2024)
摘要
When editing a video, a piece of attractive background music is
indispensable. However, video background music generation tasks face several
challenges, for example, the lack of suitable training datasets, and the
difficulties in flexibly controlling the music generation process and
sequentially aligning the video and music. In this work, we first propose a
high-quality music-video dataset BGM909 with detailed annotation and shot
detection to provide multi-modal information about the video and music. We then
present evaluation metrics to assess music quality, including music diversity
and alignment between music and video with retrieval precision metrics.
Finally, we propose the Diff-BGM framework to automatically generate the
background music for a given video, which uses different signals to control
different aspects of the music during the generation process, i.e., uses
dynamic video features to control music rhythm and semantic features to control
the melody and atmosphere. We propose to align the video and music sequentially
by introducing a segment-aware cross-attention layer. Experiments verify the
effectiveness of our proposed method. The code and models are available at
https://github.com/sizhelee/Diff-BGM.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要