Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

arxiv(2023)

引用 4|浏览12
暂无评分
摘要
Semi-supervised learning (SSL) has promising potential for improving model performance using both labelled and unlabelled data. Since recovering 3D information from 2D images is an ill-posed problem, the current state-of-the-art methods of monocular 3D object detection (Mono3D) have relatively low precision and recall, making semi-supervised learning for Mono3D tasks challenging and understudied. In this work, we propose a unified and effective semi-supervised learning framework called Mix-Teaching that can be applied to most monocular 3D object detectors. Based on the idea of decomposition and recombination, unlabelled samples are firstly decomposed into collections of image patches with high-quality predictions and collections of background images containing no objects. The student model is then trained on the mixed images containing dense instances with high-quality pseudo-labels generated by the recombination operation. In addition, we propose an uncertainty-based filter to distinguish high-quality pseudo-labels from noisy predictions during the decomposition process. As results in KITTI and nuScenes benchmarks, Mix-Teaching consistently improves MonoFlex and GUPNet by significant margins under various labeling ratios. Our method achieves around +6.34% AP(3D) improvement against the GUPNet on the validation set when using only 10% labelled data. Using the full training set and the additional 38K raw images from KITTI, it can further improve the MonoFlex by +4.65% absolute improvement on AP(3D) for car detection, reaching 18.54% AP(3D ), which ranks the 1st place among all monocular based methods on the KITTI test leaderboard.
更多
查看译文
关键词
Semi-supervised learning,3D object detection,autonomous driving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要