Non-linear estimation of voice activity to improve automatic recognition of noisy speech
INTERSPEECH(2005)
摘要
Feed-forward multi-layer perceptrons (MLP) and recurrent neural networks (RNN) fed with different sets of acoustic features are proposed for computing the presence and absence of speech in continuous speech signal in presence of various levels of background noise. Detailed performance evaluations on voice activity detection (VAD) are reported using the Aurora2, Aurora3 and TIMIT corpora. It is shown that the best results are obtained with an RNN fed by the acoustic features used for automatic speech recognition (ASR) augmented by specific features. Detailed evaluations are also proposed for ASR using Aurora2 and the German, Italian and Spanish portions of the test set of the Aurora3 corpus. The highest word error rate (WER) reduction (16.9%) is obtained when the only-noise presence probability is used to modify the phone posterior probabilities used for speech decoding.
更多查看译文
关键词
voice activity detection,multi layer perceptron,difference set,feed forward,recurrent neural network,word error rate,automatic speech recognition,posterior probability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络