Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览5
暂无评分
摘要
Diffusion generative models, which generate data by the time-reverse dynamics of diffusion processes, have attracted much attention recently, and have already been applied in the speech domain such as speech waveform synthesis. Diffusion generative models initially had the disadvantage of slow synthesis, but many fast samplers have been proposed and this disadvantage is being overcome. The authors have also proposed an efficient sampler based on a second-order approximation derived from the Ito-Taylor series, and have achieved some success. This study further examines the possibility of incorporating third-order terms and experimentally verifies that a vocoder using this method can synthesize high-fidelity fullband (48 kHz) speech signals faster than in real time. It is also shown that the method is applicable to the extension of speech bandwidth from wide-band (16 kHz) to fullband (48 kHz).
更多
查看译文
关键词
fullband speech synthesis,diffusion,third-order
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要