NOISE LEVEL LIMITED SUB-MODELING FOR DIFFUSION PROBABILISTIC VOCODERS
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)(2021)
摘要
Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-fidelity speech synthesis with a simple loss function in training, all noise components with over the full range of noise levels are predicted by one model in all iterations. This paper proposes a simple but effective noise level-limited sub-modeling framework for diffusion probabilistic vocoders Sub-WaveGrad and Sub-DiffWave. In the proposed method, DiffWave conditioned on a continuous noise level like WaveGrad, and spectral enhancement post-filtering are also provided. The proposed Sub-WaveGrad and Sub-DiffWave models are realized using 10 sub-models. These models are separately trained with different noise level limits, and only necessary sub-models are used according to the noise schedule during inference. The results of experiments using a Japanese female speech corpus indicate that both the proposed Sub-WaveGrad and Sub-DiffWave outperform vanilla WaveGrad and DiffWave in terms of the model accuracy and synthesis quality while retaining the inference speed.
更多查看译文
关键词
Speech synthesis, diffusion probabilistic vocoder, WaveGrad, DiffWave, sub-modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要