DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)(2021)
摘要
Over the past two decades, although speech emotion recognition (SER) has garnered considerable attention, the problem of insufficient training data has been unresolved. A potential solution for this problem is to pre-train a model and transfer knowledge from large amounts of audio data. However, the data used for pre-training and testing originate from different domains, resulting in the latent representations to contain non-affective information. In this paper, we propose a domain-adversarial autoencoder to extract discriminative representations for SER. Through domain-adversarial learning, we can reduce the mismatch between domains while retaining discriminative information for emotion recognition. We also introduce multi-head attention to capture emotion information from different subspaces of input utterances. Experiments on IEMOCAP show that the proposed model outperforms the state-of-the-art systems by improving the unweighted accuracy by 4.15%, thereby demonstrating the effectiveness of the proposed model.
更多查看译文
关键词
speech emotion recognition, domain adaptation, representation learning, multi-head attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要