Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism with Conv-Caps and Bi-GRU Features

ELECTRONICS(2022)

引用 22|浏览13
暂无评分
摘要
Recognizing the speaker's emotional state from speech signals plays a very crucial role in human-computer interaction (HCI). Nowadays, numerous linguistic resources are available, but most of them contain samples of a discrete length. In this article, we address the leading challenge in Speech Emotion Recognition (SER), which is how to extract the essential emotional features from utterances of a variable length. To obtain better emotional information from the speech signals and increase the diversity of the information, we present an advanced fusion-based dual-channel self-attention mechanism using convolutional capsule (Conv-Cap) and bi-directional gated recurrent unit (Bi-GRU) networks. We extracted six spectral features (Mel-spectrograms, Mel-frequency cepstral coefficients, chromagrams, the contrast, the zero-crossing rate, and the root mean square). The Cony-Cap module was used to obtain Mel-spectrograms, while the Bi-GRU was used to obtain the rest of the spectral features from the input tensor. The self-attention layer was employed in each module to selectively focus on optimal cues and determine the attention weight to yield high-level features. Finally, we utilized a confidence-based fusion method to fuse all high-level features and pass them through the fully connected layers to classify the emotional states. The proposed model was evaluated on the Berlin (EMO-DB), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and Odia (SITB-OSED) datasets to improve the recognition rate. During experiments, we found that our proposed model achieved high weighted accuracy (WA) and unweighted accuracy (UA) values, i.e., 90.31% and 87.61%, 76.84% and 70.34%, and 87.52% and 86.19%, respectively, demonstrating that the proposed model outperformed the state-of-the-art models using the same datasets.
更多
查看译文
关键词
affective computing, bi-directional gated recurrent unit, convolutional capsule network, confidence-based fusion, speech emotion recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要