Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark.
CoRR(2023)
摘要
Current automatic lyrics transcription (ALT) benchmarks focus exclusively on
word content and ignore the finer nuances of written lyrics including
formatting and punctuation, which leads to a potential misalignment with the
creative products of musicians and songwriters as well as listeners'
experiences. For example, line breaks are important in conveying information
about rhythm, emotional emphasis, rhyme, and high-level structure. To address
this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on
the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete
revision of the transcripts, geared specifically towards ALT evaluation by
following a newly created annotation guide that unifies the music industry's
guidelines, covering aspects such as punctuation, line breaks, spelling,
background vocals, and non-word sounds. Secondly, a suite of evaluation metrics
designed, unlike the traditional word error rate, to capture such phenomena. We
hope that the proposed benchmark contributes to the ALT task, enabling more
precise and reliable assessments of transcription systems and enhancing the
user experience in lyrics applications such as subtitle renderings for live
captioning or karaoke.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要