Parallel Tacotron 2 - A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.

Interspeech(2021)

引用 30|浏览47
暂无评分
摘要
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.
更多
查看译文
关键词
neural TTS,non-autoregressive,duration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要