SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval

Knowledge-Based Systems(2024)

引用 0|浏览5
暂无评分
摘要
To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors for 3D objects represented by a single modality, such as voxels, point clouds, or multiview images. It is promising to leverage complementary information from multimodal representations of 3D objects to further improve retrieval performance. However, multimodal 3D object retrieval has rarely been developed or analyzed for large-scale datasets. In this paper, we propose a self-and-cross-attention-based aggregation of point clouds and multiview images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the in-modality aggregation module (IMAM) and the cross-modality aggregation module (CMAM ), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multiview features, whereas CMAM exploits a cross-attention mechanism to interact with point-cloud and multiview features. The final descriptor of a 3D object for object retrieval can be obtained by concatenating the aggregated feature outputs of both modules. Extensive experiments and analyses were conducted on four datasets ranging from small to large scales to demonstrate the superiority of the proposed SCA-PVNet over state-of-the-art methods. In addition to achieving state-of-the-art retrieval performance, our method is more robust in challenging scenarios when views or points are missing during inference.
更多
查看译文
关键词
Point cloud,Multi-view,Self-attention,Cross-attention,Multimodal 3D object retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要