Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

引用 2|浏览25
暂无评分
摘要
Learning effective representations from unlabeled data is a challenging task for point cloud understanding. As the human visual system can map concepts learned from 2D images to the 3D world, and inspired by recent multimodal research, we introduce data from point cloud modality and image modality for joint learning. Based on the properties of point clouds and images, we propose CrossNet, a comprehensive intra- and cross-modal contrastive learning method that learns 3D point cloud representations. The proposed method achieves 3D-3D and 3D-2D correspondences of objectives by maximizing the consistency of point clouds and their augmented versions, and with the corresponding rendered images in invariant space. We further distinguish the rendered images into RGB and grayscale images to extract color and geometric features, respectively. These training objectives combine feature correspondences between modalities to combine rich learning signals from point clouds and images. Our CrossNet is simple: we add a feature extraction module and a projection head module to the point cloud and image branches, respectively, to train the backbone network in a self-supervised manner. After the network is pretrained, only the point cloud feature extraction module is required for fine-tuning and directly predicting results for downstream tasks. Our experiments on multiple benchmarks demonstrate improved point cloud classification and segmentation results, and the learned representations can be generalized across domains.
更多
查看译文
关键词
Point cloud compression,Three-dimensional displays,Task analysis,Feature extraction,Self-supervised learning,Image color analysis,Visualization,Self-supervision,cross-modal learning,joint 3D-2D,point cloud understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要