Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia)(2018)
摘要
To model the categorical speech emotion recognition tasks in a sequential approach, the first challenge is how to transfer the categorical label for each utterance into a label sequence. To settle this, we make a hypothesis that an utterance is consisting of emotional and non-emotional segments alternatively, and these non-emotional segments correspond to silent regions, short pauses, transits between phonemes, fricative phonemes, etc. With this hypothesis, we propose to treat an utterance, 's label sequence as a chain of two kinds of states: emotional states denoting emotional frames and Nulls denoting non-emotional frames. Then, we exploit a connectionist temporal classification based recurrent neural network (CTC-RNN) to automatically label and align an utterance's emotional segments with emotional labels, while non-emotional segments with non-emotional labels. Experimental results on the IEMOCAP corpus demonstrate the effectiveness of our proposed method compared to state-of-the-art emotion recognition algorithms.
更多查看译文
关键词
Speech emotion recognition,Recurrent neutral network,Connectionist temporal classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要