Improved Speech Enhancement Considering Speech PSD Uncertainty

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2022)

引用 8|浏览5
暂无评分
摘要
Speech enhancement based on statistical models has been studied for several decades. Recently, the speech enhancement adopting a speech power spectral density (PSD) uncertainty model has been proposed. This approach distinguishes the true speech PSD from its estimate and considers both as random variables. It incorporates a prior distribution of speech spectra and speech PSDestimators to derive thePSDuncertainty-aware counterpart to conventional clean speech estimators, which results in performance improvement. However, the speech PSD uncertainty model has not yet been adopted for parameter estimations such as a posteriori speech presence probability (SPP), noise PSD, and speech power spectra estimations in the speech enhancement framework. In this paper, we incorporate the speech PSD uncertainty model to all the components of the statistical model-based speech enhancement framework by deriving PSD uncertainty-aware counterparts to conventional parameter estimators. Specifically, we derive the a posteriori SPP where the likelihood function for each hypothesis is based on the speech PSD uncertainty. With this a posteriori SPP, a novel SPP-based noise PSD estimator is derived. Also, we derive theminimum mean-square error (MMSE) estimator for the power spectrum of the clean speech in the current frame under speech PSD uncertainty which is exploited to refine the speech PSD estimator. Finally, the refined speech PSD estimator is incorporated into the spectral gain function based on the speech PSD uncertainty model. The proposed approach showed improved noise PSD estimation performance in terms of the averaged logarithmic error distance, and improved speech enhancement performance in terms of the noise reduction, segmental signal-to-noise ratio, perceptual evaluation of speech quality (PESQ) scores and short-time objective intelligibility in our experiments. It also exhibited comparable performance with a real-time deep learning-based speech enhancement system in terms of the PESQ scores and composite measures for the VoiceBank-DEMAND dataset.
更多
查看译文
关键词
MMSE estimation, noise PSD estimation, spectral speech enhancement, speech presence probability, speech PSD estimation, uncertainty
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要