Robust Mvdr Beamforming Using Time-Frequency Masks For Online/Offline Asr In Noise
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)
摘要
This paper considers acoustic beamforming for noise robust automatic speech recognition (ASR). A beamformer attenuates background noise by enhancing sound components coming from a direction specified by a steering vector. Hence, accurate steering vector estimation is paramount for successful noise reduction. Recently, a beamforming approach was proposed that employs time-frequency masks. In the speech recognition system we submitted to the CHiME-3 Challenge, we employed a new form of this approach that uses a speech spectral model based on a complex Gaussian mixture model (CGMM) to estimate the time-frequency masks and the steering vector without providing technical details. This paper elaborates on this technique and examines its effectiveness for ASR. Experimental results show that the CGMM-based approach outperforms a recently proposed mask estimator based on a Watson mixture model. In addition, the CGMM-based approach is extended to an online speech enhancement scenario, which allows this technique to be used in an online recognition setup. This online version reduces the CHiME-3 evaluation error rate from 15.60% to 8.47%, which is a comparable improvement to that obtained by batch processing.
更多查看译文
关键词
Noise robust speech recognition,speech enhancement,beamforming,CHiME-3
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络