Improved voice activity detection using static harmonic features

Acoustics Speech and Signal Processing(2010)

引用 13|浏览9
暂无评分
摘要
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. We have proposed a statistical-model-based VAD using the long-term temporal information in speech, which shows good robustness against noise in an automobile environment. For further improvement, this paper describes a new method to exploit harmonic structure information with statistical models. In our approach, local peaks considered to be harmonic structures are extracted, without explicit pitch detection and voiced-unvoiced classification. The proposed method including both long-term temporal and static harmonic features led to considerable improvements under low SNR conditions in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system.
更多
查看译文
关键词
acoustic noise,speech processing,speech recognition,statistical analysis,ASR systems,automatic speech recognition systems,automobile environment,harmonic structure information,long term temporal speech information,static harmonic features,statistical model based VAD,statistical models,voice activity detection,Voice activity detection,harmonic structure,long-term temporal information,noise robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要