Addressee Detection Using Facial and Audio Features in Mixed Human-Human and Human-Robot Settings: A Deep Learning Framework

IEEE Systems, Man, and Cybernetics Magazine(2023)

引用 0|浏览27
暂无评分
摘要
Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.
更多
查看译文
关键词
Deep learning,Visualization,Annotations,Input variables,Human-robot interaction,Oral communication,Predictive models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要