EFN6D: an efficient RGB-D fusion network for 6D pose estimation

Journal of Ambient Intelligence and Humanized Computing(2024)

引用 1|浏览7
暂无评分
摘要
Precise 6DoF (6D) object pose estimation is an essential topic for many intelligent applications, for example, robot grasping, virtual reality, and autonomous driving. Lacking depth information, traditional pose estimators using only RGB cameras consistently predict bias 3D rotation and translation matrices. With the wide use of RGB-D cameras, we can directly capture both the depth for the object relative to the camera and the corresponding RGB image. Most existing methods concatenate these two data sources directly, which does not make full use of their complementary relationship. Therefore, we propose an efficient RGB-D fusion network for 6D pose estimation, called EFN6D, to exploit the 2D–3D feature more thoroughly. Instead of directly using the original single-channel depth map, we encode the depth information into a normal map and point cloud data. To effectively fuse the surface texture features and the geometric contour features of the object, we feed the RGB images and the normal map into two ResNets. Besides, the PSP modules and skip connections are used between the two ResNets, which not only enhances cross modal fusion performance of the network but also enhances the network’s capability in handling objects at different scales. Finally, the fused features obtained from these two ResNets and the point cloud features are densely fused point by point to further strengthen the fusion of 2D and 3D information at a per-pixel level. Experiments on the LINEMOD and YCB-Video datasets show that our EFN6D outperforms state-of-the-art methods by a large margin.
更多
查看译文
关键词
6D pose estimation,RGB-D,Feature fusion,Normal map
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要