MS-DETR: Efficient DETR Training with Mixed Supervision
CoRR(2024)
摘要
DETR accomplishes end-to-end object detection through iteratively generating
multiple object candidates based on image features and promoting one candidate
for each ground-truth object. The traditional training procedure using
one-to-one supervision in the original DETR lacks direct supervision for the
object detection candidates.
We aim at improving the DETR training efficiency by explicitly supervising
the candidate generation procedure through mixing one-to-one supervision and
one-to-many supervision. Our approach, namely MS-DETR, is simple, and places
one-to-many supervision to the object queries of the primary decoder that is
used for inference. In comparison to existing DETR variants with one-to-many
supervision, such as Group DETR and Hybrid DETR, our approach does not need
additional decoder branches or object queries. The object queries of the
primary decoder in our approach directly benefit from one-to-many supervision
and thus are superior in object candidate prediction. Experimental results show
that our approach outperforms related DETR variants, such as DN-DETR, Hybrid
DETR, and Group DETR, and the combination with related DETR variants further
improves the performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要