Design of a Convolutional Neural Network for Speech Emotion Recognition

ICTC(2020)

引用 13|浏览5
暂无评分
摘要
Regarding speech emotion recognition (SER) using voice, recognition accuracy increases as more data are employed. In particular, in the case of deep learning, a large amount of data is essential. However, when using an existing data set, the size of the data set is limited, and the length of the data constituting the data set can be inconsistent. The data set used in this paper consists of audio files of utterances of various lengths. In this paper, one-dimensional data was extracted from speech files, and two-dimensional mel-spectrogram images were extracted and trained using deep learning techniques such as a multi-layer perceptron (MLP) and a convolutional neural network (CNN). In addition, to improve the test accuracy, audio files were reduced to less than two seconds and preprocessed. Using the CNN, we obtained a test accuracy of approximately 60%.
更多
查看译文
关键词
convolutional neural network,neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要