PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest
arxiv(2024)
摘要
In this work, we present PoIFusion, a simple yet effective multi-modal 3D
object detection framework to fuse the information of RGB images and LiDAR
point clouds at the point of interest (abbreviated as PoI). Technically, our
PoIFusion follows the paradigm of query-based object detection, formulating
object queries as dynamic 3D boxes. The PoIs are adaptively generated from each
query box on the fly, serving as the keypoints to represent a 3D object and
play the role of basic units in multi-modal fusion. Specifically, we project
PoIs into the view of each modality to sample the corresponding feature and
integrate the multi-modal features at each PoI through a dynamic fusion block.
Furthermore, the features of PoIs derived from the same query box are
aggregated together to update the query feature. Our approach prevents
information loss caused by view transformation and eliminates the
computation-intensive global attention, making the multi-modal 3D object
detector more applicable. We conducted extensive experiments on the nuScenes
dataset to evaluate our approach. Remarkably, our PoIFusion achieves 74.9% NDS
and 73.4% mAP, setting a state-of-the-art record on the multi-modal 3D object
detection benchmark. Codes will be made available via
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要