DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding

Junmo Lee,Kwangsub Song,Kyoungjin Noh,Tae-Jun Park,Joon-Hyuk Chang

2019 International Conference on Electronics, Information, and Communication (ICEIC)（2019）

引用 1|浏览6

暂无评分

摘要

In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality.

查看译文

关键词

Hidden Markov models,Speech synthesis,Data models,Synthesizers,Convolution,Decoding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要