An Empirical Experiment on Feature Extractions Based for Speech Emotion Recognition.

Binh Van Duong, Chien Nhu Ha, Trung T. Nguyen, Phuc Nguyen,Trong-Hop Do

ACIIDS (2)(2022)

引用 0|浏览3
暂无评分
摘要
In recent years, the virtual assistant has become an essential part of many applications on smart devices. In these applications, users talk to virtual assistants in order to give commands. This makes speech emotion recognition to be a serious problem in improving the service and the quality of virtual assistants. However, speech emotion recognition is not a straightforward task as emotion can be expressed through various features. Having a deep understanding of these features is crucial to achieving a good result in speech emotion recognition. To this end, this paper conducts empirical experiments on three kinds of speech features: Mel-spectrogram, Mel-frequency cepstral coefficients, Tempogram, and their variants for the task of speech emotion recognition. Convolutional Neural Networks, Long Short-Term Memory, Multi-layer Perceptron Classifier, and Light Gradient Boosting Machine are used to build classification models used for the emotion classification task based on the three speech features. Two popular datasets: The Ryerson Audio-Visual Database of Emotional Speech and Song, and The Crowd-Sourced Emotional Multimodal Actors Dataset are used to train these models.
更多
查看译文
关键词
feature extractions,emotion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要