Adapting Pretrained Models for Adult to Child Voice Conversion

2023 31st European Signal Processing Conference (EUSIPCO)(2023)

引用 0|浏览2
暂无评分
摘要
Due to widespread lack of parallel data for adult to child voice conversion (VC), non parallel VC techniques have grown in popularity. Methods, such as encoder-decoder model, have achieved good performance in adult-to-adult VC. It provides flexibility by either training each module separately or exploit pretrained models. These pretrained models are only available for adult speech. In case of children speech, we do not have enough data to train all the modules of a robust encoder-decoder based VC system. In a limited data scenario, we can only train the decoder module using target speech. Specifically, we find that adult to child VC using a pretrained encoder and trained decoder with child speech does not yield spectral variability of a child speech. The reason being gross spectral mismatch between adult and child speech. We address this mismatch by exploiting a warping mechanism to modify the acoustic attributes based on child speech. We conduct objective and subjective evaluations on CMU and CSLU kids corpus and one adult actress data. Results show that the proposed method reduces MCD and F0 RMSE by 0.67 and 0.03 respectively. For subjective evaluations we observe a relative MOS improvement of 10.7% for naturalness and 18.23% for similarity.
更多
查看译文
关键词
Child speech,adult speech,voice conversion,encoder-decoder model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要