High-Acoustic Fidelity Text To Speech Synthesis With Fine-Grained Control Of Speech Attributes

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
Recently developed neural-based TTS models have focused on robustness and finer control over acoustic features such as phoneme duration, energy, and F 0 , allowing users to have some degree of control over the prosody of the generated speech. We propose a model with fine grained attribute control, which also has better acoustic fidelity (attributes of the output which we want to control do not deviate from the control signals) than previously proposed models as shown in our experiments 1 . Unlike other models, our proposed model does not require fine-tuning the vocoder on its outputs, indicating that it generates higher quality mel-spectrograms that are closer to the ground-truth distribution than that of other models.
更多
查看译文
关键词
acoustic features,control signals,fine grained attribute control,finer control,generated speech,high-acoustic fidelity text,phoneme duration,robustness,speech attributes,speech synthesis,TTS models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要