Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Lucrezia Tosato, Victor Fortier,Isabelle Bloch,Catherine Pelachaud

PATTERN RECOGNITION LETTERS(2024)

引用 0|浏览1
暂无评分
摘要
Studies in human-human interaction have introduced the concept of F -formation to describe the spatial arrangement of participants during social interactions. This paper has two objectives. It aims at detecting F -formations in video sequences and at predicting the next speaker in a group conversation. The proposed approach exploits time information and multimodal signals of humans in video sequences. In particular, we rely on measuring the engagement level of people as a feature of group belonging. Our approach makes use of a recursive neural network, the Long Short Term Memory (LSTM), to predict who will take the speaker's turn in a conversation group. Experiments on the MatchNMingle dataset led to 85% true positives in group detection and 98% accuracy in predicting the next speaker.
更多
查看译文
关键词
F-formation,Clustering,Temporal information,Next speaker prediction,LSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要