Effectiveness Of Cross-Domain Architectures For Whisper-To-Normal Speech Conversion

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)(2019)

引用 11|浏览13
暂无评分
摘要
Though whisper is a typical way of natural speech communication, it is different from normal speech w.r.t. to speech production and perception perspective. Recently, authors have proposed Generative Adversarial Network (GAN)-based architecture (namely, DiscoGAN) to discover such cross-domain relationships for whisper-to-normal speech (WHSP2SPCH) conversion. In this paper, we extend this study with detailed theory and analysis. In addition, Cycle-consistent Adversarial Network (CycleGAN) is also proposed for the cross-domain WHSP2SPCH conversion. We observe that the proposed systems yield objective results that are comparable to the baseline, and are superior in terms of fundamental frequency (i.e., F-0) prediction. Moreover, we observe that the proposed cross-domain architectures have been preferred 55.75% (on average) times more compared to the traditional GAN in the subjective evaluations. This reveals that the proposed method yields a more natural-sounding normal speech converted from whispered speech.
更多
查看译文
关键词
Whisper, Normal Speech, Cross-domain, GAN, DiscoGAN, CycleGAN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要