StyleFormerGAN-VC:Improving Effect of few shot Cross-Lingual Voice Conversion Using VAE-StarGAN and Attention-AdaIN

2022 IEEE/ACIS 23rd International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)(2022)

引用 0|浏览0
暂无评分
摘要
Voice Conversion (VC) aims to transfer the speaker timbre while retaining the lexical content of the source speech and has attracted much attention lately. Although previous VC models have achieved good performance, unstability can not be avoided when it comes cross-lingual scenario. In this paper, we propose the StyleFormerGAN-VC to achieve better cross language speech conversion, where variational auto-encoder is introduced to model the feature distribution of the cross-lingual utterances and adversarial training is applied to elevate the speech quality. In addition, we combine the Attention mechanism and AdaIN to make our model more generalized to unseen speaker with long utterance. Experiments show that our model performs stably in the cross-lingual scenario and gains well MOS evaluation scores.
更多
查看译文
关键词
Voice conversion,StarGAN,multi-head attention,VAE
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要