Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
arxiv(2023)
摘要
Zero-shot 3D point cloud understanding can be achieved via 2D Vision-Language
Models (VLMs). Existing strategies directly map Vision-Language Models from 2D
pixels of rendered or captured views to 3D points, overlooking the inherent and
expressible point cloud geometric structure. Geometrically similar or close
regions can be exploited for bolstering point cloud understanding as they are
likely to share semantic information. To this end, we introduce the first
training-free aggregation technique that leverages the point cloud's 3D
geometric structure to improve the quality of the transferred Vision-Language
Models. Our approach operates iteratively, performing local-to-global
aggregation based on geometric and semantic point-level reasoning. We benchmark
our approach on three downstream tasks, including classification, part
segmentation, and semantic segmentation, with a variety of datasets
representing both synthetic/real-world, and indoor/outdoor scenarios. Our
approach achieves new state-of-the-art results in all benchmarks. We will
release the source code publicly.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要