OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
arxiv(2024)
摘要
In the current state of 3D object detection research, the severe scarcity of
annotated 3D data, substantial disparities across different data modalities,
and the absence of a unified architecture, have impeded the progress towards
the goal of universality. In this paper, we propose OV-Uni3DETR, a
unified open-vocabulary 3D detector via cycle-modality propagation. Compared
with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1)
Open-vocabulary 3D detection: During training, it leverages various accessible
data, especially extensive 2D detection images, to boost training diversity.
During inference, it can detect both seen and unseen classes. 2) Modality
unifying: It seamlessly accommodates input data from any given modality,
effectively addressing scenarios involving disparate modalities or missing
sensor information, thereby supporting test-time modality switching. 3) Scene
unifying: It provides a unified multi-modal model architecture for diverse
scenes collected by distinct sensors. Specifically, we propose the
cycle-modality propagation, aimed at propagating knowledge bridging 2D and 3D
modalities, to support the aforementioned functionalities. 2D semantic
knowledge from large-vocabulary learning guides novel class discovery in the 3D
domain, and 3D geometric knowledge provides localization supervision for 2D
detection images. OV-Uni3DETR achieves the state-of-the-art performance on
various scenarios, surpassing existing methods by more than 6% on average. Its
performance using only RGB images is on par with or even surpasses that of
previous point cloud based methods. Code and pre-trained models will be
released later.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要