MS-DETR: Efficient DETR Training with Mixed Supervision
CoRR(2024)
Abstract
DETR accomplishes end-to-end object detection through iteratively generating
multiple object candidates based on image features and promoting one candidate
for each ground-truth object. The traditional training procedure using
one-to-one supervision in the original DETR lacks direct supervision for the
object detection candidates.
We aim at improving the DETR training efficiency by explicitly supervising
the candidate generation procedure through mixing one-to-one supervision and
one-to-many supervision. Our approach, namely MS-DETR, is simple, and places
one-to-many supervision to the object queries of the primary decoder that is
used for inference. In comparison to existing DETR variants with one-to-many
supervision, such as Group DETR and Hybrid DETR, our approach does not need
additional decoder branches or object queries. The object queries of the
primary decoder in our approach directly benefit from one-to-many supervision
and thus are superior in object candidate prediction. Experimental results show
that our approach outperforms related DETR variants, such as DN-DETR, Hybrid
DETR, and Group DETR, and the combination with related DETR variants further
improves the performance.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined