Cute: A Concatenative Method For Voice Conversion Using Exemplar-Based Unit Selection

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 50|浏览76
暂无评分
摘要
State-of-the art voice conversion methods re-synthesize voice from spectral representations such as MFCCs and STRAIGHT, thereby introducing muffled artifacts. We propose a method that circumvents this concern using concatenative synthesis coupled with exemplarbased unit selection. Given parallel speech from source and target speakers as well as a new query from the source, our method stitches together pieces of the target voice. It optimizes for three goals: matching the query, using long consecutive segments, and smooth transitions between the segments. To achieve these goals, we perform unit selection at the frame level and introduce triphonebased preselection that greatly reduces computation and enforces selection of long, contiguous pieces. Our experiments show that the proposed method has better quality than baseline methods, while preserving high individuality.
更多
查看译文
关键词
Voice conversion,unit selection,concatenative synthesis,exemplar-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要