Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 1|浏览1
暂无评分
摘要
This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR). This model has a Conformer-based architecture which takes as inputs a single channel each of raw and enhanced signals, and uses self-attention to derive a time-frequency mask. The enhanced input is generated by a multichannel adaptive noise cancellation algorithm known as Speech Cleaner. The time-frequency mask is applied to the noisy input to produce enhanced features for ASR. Detailed evaluations are presented with speech- and non-speech-based noise that show significant reduction in word error rate (WER) – about 80% for -6 dB SNR – over a state-of-the-art ASR model alone. It also significantly outperforms enhancement using a beamformer with ideal steering. The enhancement model can be used with different microphone arrays without the need for retraining.
更多
查看译文
关键词
automatic speech recognition,noise robust ASR,adaptive noise cancellation,noise context,speech enhancement,ideal ratio mask
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要