The Generalization Effect For Multilingual Speech Emotion Recognition Across Heterogeneous Languages

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 34|浏览13
暂无评分
摘要
Regularization approaches, such as multi-task learning and dropout, prevent overfitting and improve generalization ability. Speech emotion recognition suffers from insufficiently transcribed databases, where labels are subjectively annotated. Because emotions are a more universally recognized language, the paralinguistic feature space of emotional speech can be better generalized, even across substantially heterogeneous languages. We investigate the effect of regularization and normalization frameworks on two emotional speech databases, the IEMOCAP for English and the JTES for Japanese. We obtain absolute gains of unweighted average recall over ten runs ( 1.48% for the IEMOCAP and 1.03% for the JTES) and achieve a maximum of 59.49% on the IEMOCAP. From comparative experiments, we confirm that dropout and multi-task learning strategies are effective for multilingual speech emotion recognition, and common normalization over two languages leads to further improvement under all conditions, which suggests that better generalization is available even when two highly heterogeneous languages are merged.
更多
查看译文
关键词
speech emotion recognition, data normalization, generalization, multi-task learning, multilingual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要