Performance Comparison of LSTM Models for SER

Tanushree Swain, Utkarsh Anand, Yashaswi Aryan, Soumya Khanra,Abhishek Raj,Suprava Patnaik

Lecture Notes in Electrical EngineeringProceedings of International Conference on Communication, Circuits, and Systems(2021)

引用 1|浏览3
暂无评分
摘要
Speech emotion recognition is essentially a sequence analysis task. Therefore, deployment of LSTM models is an appropriate benchmark for automatic emotion recognition of speech. This work is an attempt to compare the performance of stacked CNN-LSTM versus stand-alone LSTM architecture for recognition of emotions. The key contribution of this work is exploitation of the stacked CNN-LSTM architecture and augmentation of training data so as to get robust and reliable performance. Results are shown for the RAVDESS database. MFCCs from preprocessed raw audio files are considered as input to the models. Accuracy and other metrics indicate that hybrid CNN-LSTM achieves improved recognition accuracy compared to the stand-alone LSTM architecture. Augmentation of data supports better learning and robustness.
更多
查看译文
关键词
Automatic speech emotion recognition, CNN, LSTM, MFCC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要