MA-CapsNet-DA: Speech emotion recognition based on MA-CapsNet using data augmentation

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览0
暂无评分
摘要
Speech emotion recognition (SER) plays a crucial role in Human-computer interaction (HCI) applications. However, it has two challenges: the lack of effectiveness of deep learning models and data scarcity issues. As a result, the deep learning models used in SER would suffer from overfitting seriously. In this study, a novel SER model called Max-avg-pooling capsule network (MA-CapsNet) is proposed and it is an improved capsule network customized for SER. We also adopt Data augmentation (DA) techniques to tackle the data scarcity issue. The proposed MA-CapsNet model consists of five sequential modules: conv-max-pooling, conv-avg-pooling, convolution, primary capsule, and digital capsule module. Furthermore, a new evaluation metric called the Expected accuracy index (EAI) is presented to evaluate the model performance effectively. The proposed approach demonstrates strong advantages over its peer models under different data partition methods, especially for augmented datasets. Experimental results also show that the proposed model has good interpretability in comparison to the peer methods for its less complicated learning topology, relatively smaller parameter sets, and fewer input features.
更多
查看译文
关键词
Speech emotion recognition,Capsule network,Data augmentation,Feature extraction,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要