MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
arxiv(2024)
摘要
This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA)
token-based multi-instrument automatic music transcription (AMT) model. Despite
SOTA performance, MT3 has the issue of instrument leakage, where transcriptions
are fragmented across different instruments. To mitigate this, we propose
MR-MT3, with enhancements including a memory retention mechanism, prior token
sampling, and token shuffling are proposed. These methods are evaluated on the
Slakh2100 dataset, demonstrating improved onset F1 scores and reduced
instrument leakage. In addition to the conventional multi-instrument
transcription F1 score, new metrics such as the instrument leakage ratio and
the instrument detection F1 score are introduced for a more comprehensive
assessment of transcription quality. The study also explores the issue of
domain overfitting by evaluating MT3 on single-instrument monophonic datasets
such as ComMU and NSynth. The findings, along with the source code, are shared
to facilitate future work aimed at refining token-based multi-instrument AMT
models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要