Deep learning for robust feature generation in audiovisual emotion recognition

ICASSP(2013)

引用 461|浏览137
暂无评分
摘要
Automatic emotion recognition systems predict high-level affective content from low-level human-centered signal cues. These systems have seen great improvements in classification accuracy, due in part to advances in feature selection methods. However, many of these feature selection methods capture only linear relationships between features or alternatively require the use of labeled data. In this paper we focus on deep learning techniques, which can overcome these limitations by explicitly capturing complex non-linear feature interactions in multimodal data. We propose and evaluate a suite of Deep Belief Network models, and demonstrate that these models show improvement in emotion classification performance over baselines that do not employ deep learning. This suggests that the learned high-order non-linear relationships are effective for emotion recognition.
更多
查看译文
关键词
low-level human-centered signal cues,emotion classification,audiovisual emotion recognition,learning (artificial intelligence),robust feature generation,deep learning techniques,emotion recognition,deep learning,deep belief networks,multimodal data,deep belief network models,high-level affective content,unsupervised feature learning,feature selection methods,multimodal features,speech,speech recognition,speech processing,accuracy,acoustics,learning artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要