StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
arxiv(2022)
摘要
Diffusion models have shown a great ability at bridging the performance gap
between predictive and generative approaches for speech enhancement. We have
shown that they may even outperform their predictive counterparts for
non-additive corruption types or when they are evaluated on mismatched
conditions. However, diffusion models suffer from a high computational burden,
mainly as they require to run a neural network for each reverse diffusion step,
whereas predictive approaches only require one pass. As diffusion models are
generative approaches they may also produce vocalizing and breathing artifacts
in adverse conditions. In comparison, in such difficult scenarios, predictive
models typically do not produce such artifacts but tend to distort the target
speech instead, thereby degrading the speech quality. In this work, we present
a stochastic regeneration approach where an estimate given by a predictive
model is provided as a guide for further diffusion. We show that the proposed
approach uses the predictive model to remove the vocalizing and breathing
artifacts while producing very high quality samples thanks to the diffusion
model, even in adverse conditions. We further show that this approach enables
to use lighter sampling schemes with fewer diffusion steps without sacrificing
quality, thus lifting the computational burden by an order of magnitude. Source
code and audio examples are available online (https://uhh.de/inf-sp-storm).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要