ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
CoRR(2024)
摘要
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model
(LLM) designed for embodied interaction, exploring a universal 3D object
understanding with 3D point clouds and languages. ShapeLLM is built upon an
improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view
image distillation for enhanced geometry understanding. By utilizing ReCon++ as
the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed
instruction-following data and tested on our newly human-curated evaluation
benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance
in 3D geometry understanding and language-unified 3D interaction tasks, such as
embodied visual grounding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要