A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion

arxiv(2023)

引用 0|浏览10
暂无评分
摘要
Whispered speech is a special voicing style of speech that is employed publicly to protect speech information. It is also the primary pronunciation form for aphonic individuals with laryngectomy for oral communication. Converting whispered speech to normal-voiced speech can significantly improve speech quality and/or speech intelligibility for whisper perception or recognition. Due to the significant voicing style difference between normal speech and whispered speech, it is still a major challenge to estimate normal-voiced speech from its whispered counterpart. Existing whisper-to-normal speech conversion methods aim to learn a nonlinear function of features between whispered speech and its normal counterpart, and the converted normal speech is reconstructed with features selected by the learned function from the training data space. These methods may produce a discontinuous spectrum in successive frames, thus decreasing the speech quality and/or intelligibility of the converted normal speech. This paper proposes a novel generative model (AGAN-W2SC) for whisper-to-normal speech conversion. Unlike the feature mapping model, the proposed AGAN-W2SC model generates a normal speech spectrum from a whispered spectrum. To make the generated spectrum more similar to the reference normal speech, the inner-feature coherence of a whisper as well as the inter-feature coherence between whispered speech and its normal counterpart is modeled in the proposed AGAN-W2SC model. Specifically, a self-attention mechanism is introduced to capture the inner-spectrum structure while a Siamese neural network is adopted to capture the interspectrum structure in the cross-domain. Additionally, the proposed model adopts identity mapping to preserve linguistic information. The proposed AGAN-W2SC is parallel data-free and can be trained at the frame level. Experimental results on whisper-to-normal speech conversion demonstrate the superior performance and effectiveness of the proposed AGAN-W2SC method over all the compared competing methods in terms of speech quality and intelligibility
更多
查看译文
关键词
Whisper-to-normal speech conversion,Generative adversarial networks,Attention mechanism,Siamese neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要