Imitation Learning Based on Visual-text Fusion for Robotic Sorting Tasks

2022 International Conference on Frontiers of Artificial Intelligence and Machine Learning (FAIML)(2022)

引用 0|浏览0
暂无评分
摘要
In this paper, we propose an imitation learning method based visual-text fusion for manipulation task. Manipulation is predicted based on text instructions by abstracting the manipulation into text instructions, learning the semantic concepts in the text instructions, and combining them with spatial features for visual inference. The construction process and demonstration content of the expert demonstration dataset is described in detail, which is focused on the process of decomposing the operation task through text. In addition, we present the learning process and demonstrate the network structure of functional modules to highlight the fusion of text features with visual features. The effectiveness of this method is verified by a simulated learning experiment on a multi-step manipulation task. The results show that the behavioral strategy achieved a 92.19% task completion rate on known objects and 80.03% on unknown objects. It is proved that, owing to the introduction of text, the decomposition of the operational task in terms of abstract semantics is realized and the difficulty of learning is reduced. Meanwhile, the behavioral strategy can perform accurate spatial location inference based on text features, thereby achieving accurate action prediction.
更多
查看译文
关键词
visual-text fusion,vision-based manipu-lation,language grounding for robotics,imitation learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要