Voice Activity Detection for Live Speech of Baseball Game Based on Tandem Connection with Speech/Noise Separation Model.

Interspeech(2021)

引用 1|浏览3
暂无评分
摘要
When applying voice activity detection (VAD) to a noisy sound, in general, noise reduction (speech separation) and VAD are performed separately. In this case, the noise reduction may suppress the speech, and the VAD may not work well for the speech after the noise reduction. This study proposes a VAD model through the tandem connection of neural network-based noise separation and a VAD model. By training the two models simultaneously, the noise separation model is expected to be trained to consider the VAD results, and thus effective noise separation can be achieved. Moreover, the improved speech/noise separation model will improve the accuracy of the VAD model. In this research, we deal with real-live speeches from baseball games, which have a very poor signal-to-noise ratio. The VAD experiments showed that the VAD performance at the frame level achieved 4.2 points improvement in F1-score by tandemly connecting the speech/noise separation model and the VAD model.
更多
查看译文
关键词
multi-task learning,speech/noise separation,tandem connection,voice activity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要