Deep Modelling Strategies for Human Confidence Classification using Audio-visual Data.

2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)(2023)

引用 0|浏览0
暂无评分
摘要
Human behavior expressions such as of confidence are time-varying entities. Both vocal and facial cues that convey the human confidence expressions keep varying throughout the duration of analysis. Although, the cues from these two modalities are not always in synchrony, they impact each other and the fused outcome as well. In this paper, we present a deep fusion technique to combine the two modalities and derive a single outcome to infer human confidence. Fused outcome improves the classification performance by capturing the temporal information from both the modalities. The analysis of time-varying nature of expressions in the conversations captured in an interview setup is also presented. We collected data from 51 speakers who participated in interview sessions. The average area under the curve (AUC) of uni-modal models using speech and facial expressions is 70.6% and 69.4%, respectively, for classifying confident videos from non-confident ones in 5-fold cross-validation analysis. Our deep fusion model improves the performance giving an average AUC of 76.8%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要