LLMs for Robotic Object Disambiguation
CoRR(2024)
摘要
The advantages of pre-trained large language models (LLMs) are apparent in a
variety of language processing tasks. But can a language model's knowledge be
further harnessed to effectively disambiguate objects and navigate
decision-making challenges within the realm of robotics? Our study reveals the
LLM's aptitude for solving complex decision making challenges that are often
previously modeled by Partially Observable Markov Decision Processes (POMDPs).
A pivotal focus of our research is the object disambiguation capability of
LLMs. We detail the integration of an LLM into a tabletop environment
disambiguation task, a decision making problem where the robot's task is to
discern and retrieve a user's desired object from an arbitrarily large and
complex cluster of objects. Despite multiple query attempts with zero-shot
prompt engineering (details can be found in the Appendix), the LLM struggled to
inquire about features not explicitly provided in the scene description. In
response, we have developed a few-shot prompt engineering system to improve the
LLM's ability to pose disambiguating queries. The result is a model capable of
both using given features when they are available and inferring new relevant
features when necessary, to successfully generate and navigate down a precise
decision tree to the correct object–even when faced with identical options.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要