Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation
IEEE Robotics and Automation Letters(2023)
摘要
We present LGX (Language-guided Exploration), a novel algorithm for
Language-Driven Zero-Shot Object Goal Navigation
(L-ZSON), where an embodied agent navigates to an
uniquely described
target object in a
previously unseen
environment. Our approach makes use of Large Language Models (LLMs) for this task by leveraging the LLM's commonsense-reasoning capabilities for making sequential navigational decisions. Simultaneously, we perform generalized target object detection using a pre-trained Vision-Language grounding model. We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline of the OWL-ViT CLIP on Wheels (OWL CoW). Furthermore, we study the usage of LLMs for robot navigation and present an analysis of various prompting strategies affecting the model output. Finally, we showcase the benefits of our approach via
real-world
experiments that indicate the superior performance of LGX in detecting and navigating to visually unique objects.
更多查看译文
关键词
AI-Enabled Robotics,Human-Centered Robotics,Autonomous Agents,Domestic Robotics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要