Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass

2023 15th International Conference on COMmunication Systems & NETworkS (COMSNETS)(2023)

引用 0|浏览10
暂无评分
摘要
We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.
更多
查看译文
关键词
Human-AI Collaboration,Referring Expression Comprehension,Visual Grounding,Multi-Modal Networks,Pervasive Systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要