Linear Prediction-based Parallel WaveGAN Speech Synthesis

2022 International Conference on Electronics, Information, and Communication (ICEIC)(2022)

引用 2|浏览0
暂无评分
摘要
This paper proposes a linear prediction (LP)-based neural speech synthesis method for a Parallel WaveGAN (PWG) framework. A recently proposed PWG vocoder successfully generates waveform sequences using a fast non-autoregressive Wave Net model. However, it often suffers from noisy outputs because of difficulties in capturing the complicated time-varying nature of speech signals. To improve synthesis quality, we introduce a back-propagable LP synthesis method for a PWG framework. Based on a source-filter theory of speech production model, the proposed PWG model learns the behavior of a source excitation signal, which is decoupled from a speech signal using the LP synthesis filter. In this way, it is possible to separately train the characteristics of only excitation signal while considering the interaction between the vocal source and vocal tract filter. Thus, the quality of the synthesized speech signal can be further improved. Objective and subjective evaluation results verified that the proposed methods reconstruct significantly better quality of synthetic speech than conventional methods.
更多
查看译文
关键词
Speech synthesis,text-to-speech,neural vocoder,linear prediction,Parallel WaveGAN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要