A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition

Yu-Xuan Xi,Yan Song,Li-Rong Dai, Lin Liu

Man-Machine Speech Communication（2023）

引用 0|浏览27

暂无评分

摘要

Speech emotion recognition (SER) is the task of automatically identifying human emotions from the analysis of utterances. In practical applications, the task is often affected by subsidiary information, such as speaker or phoneme information. Traditional domain adaptation approaches are often applied to remove unwanted domain-specific knowledge, but often unavoidably contribute to the loss of useful categorical information. In this paper, we proposed a time-frequency attention mechanism based on multi-task learning (MTL). This uses its own content information to obtain self attention in time and channel dimensions, and obtain weight knowledge in the frequency dimension through domain information extracted from MTL. We conduct extensive evaluations on the IEMOCAP benchmark to assess the effectiveness of the proposed representation. Results demonstrate a recognition performance of 73.24% weighted accuracy (WA) and 73.18% unweighted accuracy (UA) over four emotions, outperforming the baseline by about 4%.

查看译文

关键词

speech emotion recognition, convolutional neural network, multi-task learning, attention mechanism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要