Speech emotion recognition using combination of features
Intelligent Control and Information Processing(2013)
摘要
In this paper, we study how speech features' numbers and statistical values impact recognition accuracy of emotions present in speech. With Gaussian Mixture Model (GMM), we identify two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) and Auto Correlation Function Coefficients (ACFC) extracted directly from speech signal. Using GMM supervector formed by values of MFCCs, delta MFCCs and ACFC, we conduct experiments with Berlin emotional database considering six previously proposed emotions: anger, disgust, fear, happy, neutral and sad. Our method achieve emotion recognition rate of 74.45%, significantly better than 59.00% achieved previously. To prove the broad applicability of our method, we also conduct experiments considering a different set of emotions: anger, boredom, fear, happy, neutral and sad. Our emotion recognition rate of 75.00% is again better than71.00% of the method of hidden Markov model with MFCC, delta MFCC, cepstral coefficient and speech energy.
更多查看译文
关键词
acfc,speech recognition,gmm supervector,cepstral coefficient,statistical values,speech signal,auto correlation function coefficients,berlin emotional database,emotion recognition,feature combination,gaussian processes,speech emotion recognition,speech energy,mel frequency cepstrum coefficients,hidden markov models,delta mfcc,gaussian mixture model,hidden markov model,correlation,mel frequency cepstral coefficient,accuracy,feature extraction,speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络