Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition

2021 IEEE Spoken Language Technology Workshop (SLT)(2021)

引用 8|浏览1
暂无评分
摘要
Domain generalization is a major challenge for cross-corpus speech emotion recognition. The recognition performance built on "seen" source corpora is inevitably degraded when the systems are tested against "unseen" target corpora that have different speakers, channels, and languages. We present a novel framework based on a triplet network to learn more generalized features of emotional speech that are invariant across multiple corpora. To reduce the intrinsic discrepancies between source and target corpora, an explicit feature transformation based on the triplet network is implemented as a preprocessing step. Extensive comparison experiments are carried out on three emotional speech corpora; two English corpora, and one Japanese corpus. Remarkable improvements of up-to 35.61% are achieved for all cross-corpus speech emotion recognition, and we show that the proposed framework using the triplet network is effective for obtaining more generalized features across multiple emotional speech corpora.
更多
查看译文
关键词
Speech emotion recognition,cross-corpus,domain generalization,triplet network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要