Novel Metric Learning for Non-parallel Voice Conversion

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 5|浏览51
暂无评分
摘要
Obtaining aligned spectral pairs in case of non-parallel data for stand-alone Voice Conversion (VC) technique is a challenging research problem. Unsupervised alignment algorithm, namely, an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) iteratively tries to align the spectral features by minimizing the Euclidean distance metric between the intermediate converted and the target spectral feature vectors. However, the Euclidean distance may not correlate well with the perceptual distance between the two (sound or visual) patterns in a given feature space. In this paper, we propose to learn distance metric using Large Margin Nearest Neighbor (LMNN) technique that gives a minimum distance for the same phoneme uttered by the different speakers and more distance for the different set of phonemes. This learned metric is then used for finding the NN pairs in the INCA. Furthermore, we propose to use this learned metric only for the first iteration in the INCA, since the intermediate converted features (which are not the actual acoustic features) may not behave well w.r.t. the learned metric. We obtained on an average 7.93 % relative improvement in Phonetic Accuracy (PA). This is reflected positively in subjective and objective evaluations.
更多
查看译文
关键词
VC, INCA, Metric Learning, LMNN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要