DNN-based causal voice activity detector

Information Theory and Applications Workshop(2016)

引用 27|浏览6
暂无评分
摘要
Voice Activity Detectors (VAD) are important components in audio processing algorithms. In general, VADs are two way classifiers, flagging the audio frames where we have voice activity. Most of them are based on the signal energy and build statistical models of the noise background and the speech signal. In the process of derivation, we are limited to simplified statistical models and this limits the accuracy of the classification. Using more precise, but also more complex, statistical models makes the analytical derivation of the solution practically impossible. In this paper, we propose using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision. In most of the cases we need a causal algorithm, ie working in real time and using only current and past audio samples. This is why we use audio segments that consist only of current and previous audio frames, thus making possible real-time implementations. The proposed algorithm and DNN structure exceeds the classic, statistical model based VAD for both seen and unseen noises.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要