Acoustic-articulatory emotion recognition using multiple features and parameter-optimized cascaded deep learning network

KNOWLEDGE-BASED SYSTEMS(2024)

引用 0|浏览4
暂无评分
摘要
Multimodal emotion recognition is an important research direction within artificial intelligence. In this study, we propose a model for acoustic-articulatory emotion recognition. For the acoustic data, this model extracts Interspeech 2009 (IS 09) features, and for articulatory data, it extracts our proposed phase space reconstruction-geometric (PSR-G) features. It then feeds the spliced features into an improved sparrow search algorithm (ISSA)-cascaded deep learning (CDL) network to obtain the final recognition results. We propose the PSR-G features to reflect phase and geometric information, where the reconstructed phase-space signal of the articulatory data is plotted in three-dimensional space and geometric features based on distances and angles are extracted. We also propose the ISSA-CDL network for emotion recognition, in which the CDL network effectively merges acoustic and articulatory features, and fully leverages the advantages of the one-dimensional convolutional neural network, a multi-head self-attention mechanism, and double-layer bidirectional long short-term memory. Finally, we propose the ISSA, in which we use a tent map and the firefly algorithm to optimize the parameters of the CDL network to decrease the instability and randomness induced by subjective experience. We conducted experiments using the self-recorded STEM - E2VA database and obtained the following results: (1) PSR-G features lead to higher recognition accuracy for the articulatory data than other existing features. (2) The CDL network effectively merges bimodal features, and the ISSA effectively optimizes the parameters of the CDL network. (3) The final accuracy in acoustic-articulatory emotion recognition is 95.87 +/- 0.29%, which is higher than that for acoustic features (81.16 +/- 0.47%) or articulatory features (93.27 +/- 0.47%) alone.
更多
查看译文
关键词
Acoustic-articulatory emotion recognition,Multimodal feature fusion,PSR-G features,ISSA-CDL network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要