DLFusion: Painting-Depth Augmenting-LiDAR for Multimodal Fusion 3D Object Detection

MM '23: Proceedings of the 31st ACM International Conference on Multimedia(2023)

引用 0|浏览5
暂无评分
摘要
Surround-view cameras combined with image depth transformation to 3D feature space and fusion with point cloud features are highly regarded. The transformation of 2D features into 3D feature space by means of predefined sampling points and depth distribution happens throughout the scene, and this process generates a large number of redundant features. In addition, multimodal feature fusion unified in 3D space often happens in the previous step of the downstream task, ignoring the interactive fusion between different scales. To this end, we design a new framework, focusing on the design that can give 3D geometric perception information to images and unify them into voxel space to accomplish multi-scale interactive fusion, and we mitigate feature alignment between modal features by geometric relationships between voxel features. The method has two main designs. First, a Segmentation-guided Image View Transformation module is used to accurately transform the pixel region containing the object into a 3D pseudo-point voxel space with the help of a depth distribution. This allows subsequent feature fusion to be performed in a unified voxel feature. Secondly, a Voxel-centric Consistent Fusion module is used to alleviate the errors caused by depth estimation, as well as to achieve better feature fusion between unified modalities. Through extensive experiments on the KITTI and nuScenes datasets, we validate the effectiveness of our camera-LIDAR fusion method. Our proposed approach shows competitive performance on both datasets and outperforms state-of-the-art methods in certain classes of 3D object detection benchmarks. https://github.com/no-Name128/DLFusion [code release]
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要