Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation
arxiv(2024)
摘要
Recent advancements in Generative Artificial Intelligence, particularly in
the realm of Large Language Models (LLMs) and Large Vision Language Models
(LVLMs), have enabled the prospect of leveraging cognitive planners within
robotic systems. This work focuses on solving the object goal navigation
problem by mimicking human cognition to attend, perceive and store task
specific information and generate plans with the same. We introduce a
comprehensive framework capable of exploring an unfamiliar environment in
search of an object by leveraging the capabilities of Large Language
Models(LLMs) and Large Vision Language Models (LVLMs) in understanding the
underlying semantics of our world. A challenging task in using LLMs to generate
high level sub-goals is to efficiently represent the environment around the
robot. We propose to use a 3D scene modular representation, with semantically
rich descriptions of the object, to provide the LLM with task relevant
information. But providing the LLM with a mass of contextual information (rich
3D scene semantic representation), can lead to redundant and inefficient plans.
We propose to use an LLM based pruner that leverages the capabilities of
in-context learning to prune out irrelevant goal specific information.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要