Intrusive and Non-Intrusive Perceptual Speech Quality Assessment Using a Convolutional Neural Network

2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)(2019)

引用 48|浏览85
暂无评分
摘要
Speech quality, as perceived by humans, is an important performance metric for telephony and voice services. It is typically measured through subjective listening tests, which can be tedious and expensive. Algorithms such as PESQ and POLQA serve as a computational proxy for subjective listening tests. Here we propose using a convolutional neural network to predict the perceived quality of speech with noise, reverberation, and distortions, both intrusively and non-intrusively, i.e., with and without a clean reference signal. The network model is trained and evaluated on a corpus of about ten thousand utterances labeled by human listeners to derive a Mean Opinion Score (MOS) for each utterance. It is shown to provide more accurate MOS estimates than existing speech quality metrics, including PESQ and POLQA. The proposed method reduces the root mean squared error from 0.48 to 0.35 MOS points and increases the Pearson correlation from 0.78 to 0.89 compared to the state-of-the-art POLQA algorithm.
更多
查看译文
关键词
Speech quality,mean opinion score,PESQ,POLQA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要