Fusing Dialogue and Gaze From Discussions of 2D and 3D Scenes

Adjunct of the 2019 International Conference on Multimodal Interaction(2019)

引用 3|浏览1
暂无评分
摘要
Conversation partners rely on inference using each other’s gaze and utterances to negotiate shared meaning. In contrast, dialogue systems still operate mostly with unimodal question or command and response interactions. To realize systems that can intuitively discuss and collaborate with humans, we should consider other sensory information. We begin to address this limitation with an innovative study that acquires, analyzes, and fuses interlocutors’ discussion and gaze. Introducing a discussion-based elicitation task, we collect gaze with remote and wearable eye trackers alongside dialogue as interlocutors come to consensus on questions about an on-screen 2D image and a real-world 3D scene. We analyze the visual-linguistic patterns, and also map the modalities onto the visual environment by extending a multimodal image region annotation framework using statistical machine translation for multimodal fusion, applying three ways of fusing speakers’ gaze and discussion.
更多
查看译文
关键词
2D and 3D scenes, dialogue, eye movements, gaze, multimodal fusion, spoken discussion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要