M-DETR: Multi-scale DETR for Optical Music Recognition

Expert Systems with Applications(2024)

引用 0|浏览0
暂无评分
摘要
Optical Music Recognition (OMR) is an important way to digitize score images and has broad application prospects in fields such as the storage of music documents, music education and digital creation. As a new paradigm for object detection, DETR (detection transformer) has the ability to associate contextual information, which can be exploited to resolve the OMR task. However, the original DETR does not fit OMR well due to its high computational complexity and numerous parameters. To address the DETR defects and improve the recognition accuracy of OMR, we propose a novel multi-scale DETR (M-DETR) with a multi-scale feature fusion mechanism and improved attention mechanisms. First, a new multi-scale feature fusion mechanism is designed to let the backbone network of M-DETR get rich multi-scale information. Then, a key-region attention mechanism is incorporated based on the character that the key information is concentrated on a score image. Finally, the pre-context attention mechanism is introduced to make better use of the contextual association between recognition notes in music scores. Experiment results show that M-DETR achieves recognition accuracy of 90.6% for 7 typical small-sized notes, which is better than Faster R-CNN and YOLO v5, and the improvement rate is 10.02% compared to the original DETR algorithm. The results indicate that M-DETR is an effective way for the OMR task, which also provides a new solution for the detection of small-sized objects with contextual association.
更多
查看译文
关键词
OMR,DETR,Feature fusion,Attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要