Voice Conversion Using Deep Bidirectional Long Short-Term Memory Based Recurrent Neural Networks

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)

引用 320|浏览147
暂无评分
摘要
This paper investigates the use of Deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks (DBLSTM-RNNs) for voice conversion. Temporal correlations across speech frames are not directly modeled in frame-based methods using conventional Deep Neural Networks (DNNs), which results in a limited quality of the converted speech. To improve the naturalness and continuity of the speech output in voice conversion, we propose a sequence-based conversion method using DBLSTM-RNNs to model not only the frame-wised relationship between the source and the target voice, but also the long-range context-dependencies in the acoustic trajectory. Experiments show that DBLSTM-RNNs outperform DNNs where Mean Opinion Scores are 3.2 and 2.3 respectively. Also, DBLSTM-RNNs without dynamic features have better performance than DNNs with dynamic features.
更多
查看译文
关键词
voice conversion,bidirectional long short-term memory,recurrent neural networks,dynamic features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要