Novel Inter Mixture Weighted GMM Posteriorgram for DNN and GAN-based Voice Conversion.
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(2018)
摘要
Voice Conversion (VC) requires an alignment of the spectral features before learning the mapping function, due to the speaking rate variations across the source and target speakers. To address this issue, the idea of training two parallel networks with the use of speaker-independent representation was proposed. In this paper, we explore the unsupervised Gaussian Mixture Model (GMM) posteriorgram as a speaker-independent representation. However, in the GMM posteriorgram, the same phonetic information gets spread across more than one component due to the speaking style variations across the speakers. In particular, this spread is limited to a group of neighboring components for a given phone. We propose to share the posterior probability of each component with the limited number of neighboring components that are sorted based on the Kullback-Leibler (KL) divergence. We propose to employ a Deep Neural Network (DNN) and a Generative Adversarial Network (GAN)-based framework to measure the effectiveness of the proposed Inter Mixture Weighted GMM (IMW GMM) posteriorgram on the Voice Conversion Challenge (VCC) 2016 database. The relative improvement of 13.73 %, and 5.25 % is obtained with the proposed IMW GMM posteriorgram w.r.t. the GMM posteriorgram for the speech quality and the speaker similarity of the converted voices, respectively.
更多查看译文
关键词
IMW GMM Posteriorgram,generative adversarial network,voice conversion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络