MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval(2023)

引用 0|浏览7
暂无评分
摘要
As talking takes a large proportion of human lives, it is necessary to perform deeper understanding of human conversations. Speaking style recognition is aimed at recognizing the styles of conversations, which provides a fine-grained description about talking. Current works focus on adopting only visual clues to recognize speaking styles, which cannot accurately distinguish different speaking styles when they are visually similar. To recognize speaking styles more effectively, we propose a novel multimodal sentiment-fused method, MMSF, which extracts and integrates visual, audio and textual features of videos. In addition, as sentiment is one of the motivations of human behavior, we first introduce sentiment into our multimodal method with cross-attention mechanism, which enhance the video feature to recognize speaking styles. The proposed MMSF is evaluated on long-form video understanding benchmark, and the experiment results show that it is superior to the state-of-the-arts.
更多
查看译文
关键词
Speaking style recognition, multimodal analysis, sentiment analysis, long-form video understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要