Speech Intelligibility Enhancement Using Non-Parallel Speaking Style Conversion With Stargan And Dynamic Range Compression

2020 IEEE International Conference on Multimedia and Expo (ICME)(2020)

引用 3|浏览22
暂无评分
摘要
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a data-driven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.
更多
查看译文
关键词
speech intelligibility,Lombard effect,speaking style conversion (SSC),StarGAN,dynamic range compression (DRC)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要