TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
arxiv(2024)
摘要
One persistent challenge in Speech Emotion Recognition (SER) is the
ubiquitous environmental noise, which frequently results in diminished SER
performance in practical use. In this paper, we introduce a Two-level
Refinement Network, dubbed TRNet, to address this challenge. Specifically, a
pre-trained speech enhancement module is employed for front-end noise reduction
and noise level estimation. Later, we utilize clean speech spectrograms and
their corresponding deep representations as reference signals to refine the
spectrogram distortion and representation shift of enhanced speech during model
training. Experimental results validate that the proposed TRNet substantially
increases the system's robustness in both matched and unmatched noisy
environments, without compromising its performance in clean environments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要