Non-Intrusive Estimation of Speech Signal Parameters using a Frame-based Machine Learning Approach

Dushyant Sharma, Lucia Berger, Carl Quillen,Patrick A. Naylor

2020 28th European Signal Processing Conference (EUSIPCO)（2021）

引用 2|浏览6

暂无评分

摘要

We present a novel, non-intrusive method that jointly estimates acoustic signal properties associated with the perceptual speech quality, level of reverberation and noise in a speech signal. We explore various machine learning frameworks, consisting of popular feature extraction front-ends and two types of regression models and show the trade-off in performance that must be considered with each combination. We show that a short-time framework consisting of an 80-dimension log-Mel filter bank feature front-end employing spectral augmentation, followed by a 3 layer LSTM recurrent neural network model achieves a mean absolute error of 3.3 dB for C50, 2.3 dB for segmental SNR and 0.3 for PESQ estimation on the Libri Augmented (LA) database. The internal VAD for this system achieves an F1 score of 0.93 on this data. The proposed system also achieves a 2.4 dB mean absolute error for C50 estimation on the ACE test set. Furthermore, we show how each type of acoustic parameter correlates with ASR performance in terms of ground truth labels and additionally show that the estimated C50, SNR and PESQ from our proposed method have a high correlation (greater than 0.92) with WER on the LA test set.

查看译文

关键词

deep neural networks,clarity index,speech quality

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要