Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network

2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia)(2018)

引用 9|浏览6
暂无评分
摘要
To model the categorical speech emotion recognition tasks in a sequential approach, the first challenge is how to transfer the categorical label for each utterance into a label sequence. To settle this, we make a hypothesis that an utterance is consisting of emotional and non-emotional segments alternatively, and these non-emotional segments correspond to silent regions, short pauses, transits between phonemes, fricative phonemes, etc. With this hypothesis, we propose to treat an utterance, 's label sequence as a chain of two kinds of states: emotional states denoting emotional frames and Nulls denoting non-emotional frames. Then, we exploit a connectionist temporal classification based recurrent neural network (CTC-RNN) to automatically label and align an utterance's emotional segments with emotional labels, while non-emotional segments with non-emotional labels. Experimental results on the IEMOCAP corpus demonstrate the effectiveness of our proposed method compared to state-of-the-art emotion recognition algorithms.
更多
查看译文
关键词
Speech emotion recognition,Recurrent neutral network,Connectionist temporal classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要