Next-Speaker Prediction Based on Non-Verbal Information in Multi-Party Video Conversation

Saki Mizuno,Nobukatsu Hojo,Satoshi Kobashikawa,Ryo Masumura

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 0|浏览3

暂无评分

摘要

We propose a method for next-speaker prediction, a task to predict who speaks in the next turn among multiple current listeners, in multi-party video conversation. Previous studies used non-verbal features, such as head movements and gaze behavior, for next-speaker prediction in face-to-face conversation. However, in video conversation, these non-verbal features are vague and ineffective because they look at the screen displaying other participants. Since non-verbal features include participant characteristics, it is necessary to use training data with rich combinations of participants to robustly predict the next speaker. Previous studies used training data with a limited number of combinations of participants because the data consist only of recorded data. Therefore, the proposed method uses 1) novel non-verbal features for next-speaker prediction in video conversation, specifically facial expressions, hand movements and speech segments, and 2) data augmentation of participant combinations in the training data. We conducted experiments to evaluate the proposed method, and the results using video-conversation data indicate its effectiveness.

查看译文

关键词

multi-party conversation,video conversation,next-speaker prediction,non-verbal information

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要