The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation.
CoRR(2023)
摘要
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of
high-quality audio-caption pairs, designed for the evaluation of
music-and-language models. The dataset consists of 1.1k human-written natural
language descriptions of 706 music recordings, all publicly accessible and
released under Creative Common licenses. To showcase the use of our dataset, we
benchmark popular models on three key music-and-language tasks (music
captioning, text-to-music generation and music-language retrieval). Our
experiments highlight the importance of cross-dataset evaluation and offer
insights into how researchers can use SDD to gain a broader understanding of
model performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要