Voice conversion towards modeling dynamic characteristics using switching state space model

Ning Xu,JingYi Bao,XiaoFeng Liu, AiMing Jiang, YiBing Tang

SCIENCE CHINA Information Sciences(2013)

引用 0|浏览11
暂无评分
摘要
In the literature of voice conversion (VC), the method based on statistical Gaussian mixture model (GMM) serves as a benchmark. However, one of the inherent drawbacks of GMM is well-known as discontinuity problem, which is caused by transforming features on a frame-by-frame basis, thus ignoring the dynamics between adjacent frames and finally resulting in degraded quality of the converted speech. A variety of algorithms have been proposed to overcome this deficiency, among which the state space model (SSM) based method provides some promising results. In this paper, we proceed by presenting an enhanced version of the traditional SSM, namely, the switching SSM (SSSM). This new structure is more flexible than the conventional one in that it allows using mixture of components to account for the rapid transitions between neighboring frames. Moreover, physical meaning of the model parameters of SSSM has been examined in depth, leading to efficient application-specific training and transforming procedures of VC. Experiments including both objective and subjective measurements were conducted to compare the performances of the conventional and the proposed SSM-based methods, which have convinced that obvious improvements in both aspects of similarity and quality can be obtained by SSSM.
更多
查看译文
关键词
gaussian mixture model,switching state space model,discontinuity problem,voice conversion,dynamic characteristics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要