Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation
arxiv(2024)
摘要
Unsupervised domain adaptation (UDA) is vital for alleviating the workload of
labeling 3D point cloud data and mitigating the absence of labels when facing a
newly defined domain. Various methods of utilizing images to enhance the
performance of cross-domain 3D segmentation have recently emerged. However, the
pseudo labels, which are generated from models trained on the source domain and
provide additional supervised signals for the unseen domain, are inadequate
when utilized for 3D segmentation due to their inherent noisiness and
consequently restrict the accuracy of neural networks. With the advent of 2D
visual foundation models (VFMs) and their abundant knowledge prior, we propose
a novel pipeline VFMSeg to further enhance the cross-modal unsupervised domain
adaptation framework by leveraging these models. In this work, we study how to
harness the knowledge priors learned by VFMs to produce more accurate labels
for unlabeled target domains and improve overall performance. We first utilize
a multi-modal VFM, which is pre-trained on large scale image-text pairs, to
provide supervised labels (VFM-PL) for images and point clouds from the target
domain. Then, another VFM trained on fine-grained 2D masks is adopted to guide
the generation of semantically augmented images and point clouds to enhance the
performance of neural networks, which mix the data from source and target
domains like view frustums (FrustumMixing). Finally, we merge class-wise
prediction across modalities to produce more accurate annotations for unlabeled
target domains. Our method is evaluated on various autonomous driving datasets
and the results demonstrate a significant improvement for 3D segmentation task.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要