CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot
CoRR(2024)
摘要
This paper introduces CognitiveDog, a pioneering development of quadruped
robot with Large Multi-modal Model (LMM) that is capable of not only
communicating with humans verbally but also physically interacting with the
environment through object manipulation. The system was realized on Unitree Go1
robot-dog equipped with a custom gripper and demonstrated autonomous
decision-making capabilities, independently determining the most appropriate
actions and interactions with various objects to fulfill user-defined tasks.
These tasks do not necessarily include direct instructions, challenging the
robot to comprehend and execute them based on natural language input and
environmental cues. The paper delves into the intricacies of this system,
dataset characteristics, and the software architecture. Key to this development
is the robot's proficiency in navigating space using Visual-SLAM, effectively
manipulating and transporting objects, and providing insightful natural
language commentary during task execution. Experimental results highlight the
robot's advanced task comprehension and adaptability, underscoring its
potential in real-world applications. The dataset used to fine-tune the
robot-dog behavior generation model is provided at the following link:
huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要