Multimodal Error Correction with Natural Language and Pointing Gestures

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW(2023)

引用 0|浏览0
暂无评分
摘要
Error correction is crucial in human-computer interaction, as it can provide supervision for incrementally learning artificial intelligence. If a system maps entities like objects or persons with unknown class to inappropriate existing classes, or misrecognizes entities from known classes when there is too high train-test discrepancy, error correction is a natural way for a user to improve the system. Provided an agent with visual perception, if such entity is in the view of the system, pointing gestures can dramatically simplify the error correction. Therefore, we propose a modularized system for multimodal error correction using natural language and pointing gestures. First, pointing line generation and region proposal detects whether there is a pointing gesture, and if yes, which candidate objects (i. e. RoIs) are on the pointing line. Second, these RoIs (if any) and the user's utterances are fed into a VL-T5 network to extract and link both the class name and the corresponding RoI of the referred entity, or to output that there is no error correction. In the latter case, the utterances can be passed to a downstream component for Natural Language Understanding. We use additional, challenging annotations for an existing real-world pointing gesture dataset to evaluate our proposed system. Furthermore, we demonstrate our approach by integrating it on a real-world steerable laser pointer robot, enabling interactive multimodal error correction and thus incremental learning of new objects.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要