Whether Contribution of Features Differ Between Video-Mediated and In-Person Meetings in Important Utterance Estimation

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览3
暂无评分
摘要
This study investigated differences in the contributions of various features to in-person (IP) and video-mediated (VM) meetings. We focused on estimating important utterances using both an IP and a VM meeting corpora as the analysis data. A transformer model with dialogue history was used to estimate important utterances, and five types of input (text, speaker’s audio, others’ audio, speaker’s video, and others’ video) were fed to the model. A comparison of the models for IP and VM revealed that the speaker’s audio has a strong effect on the IP model, the video of the other participants strongly affects the VM model, and the text and others’ audio strongly affects both models in estimating important utterances.
更多
查看译文
关键词
multiparty,multimodal,in-person,videomediated,important utterance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要