CST: Complex Sparse Transformer for Low-SNR Speech Enhancement.

Sensors (Basel, Switzerland)(2023)

引用 0|浏览7
暂无评分
摘要
Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model's attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model's perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.
更多
查看译文
关键词
UNET architecture,attention mechanisms,speech enhancement,transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要