Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
CoRR(2023)
摘要
We introduce Dream2Real, a robotics framework which integrates
vision-language models (VLMs) trained on 2D data into a 3D object rearrangement
pipeline. This is achieved by the robot autonomously constructing a 3D
representation of the scene, where objects can be rearranged virtually and an
image of the resulting arrangement rendered. These renders are evaluated by a
VLM, so that the arrangement which best satisfies the user instruction is
selected and recreated in the real world with pick-and-place. This enables
language-conditioned rearrangement to be performed zero-shot, without needing
to collect a training dataset of example arrangements. Results on a series of
real-world tasks show that this framework is robust to distractors,
controllable by language, capable of understanding complex multi-object
relations, and readily applicable to both tabletop and 6-DoF rearrangement
tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要