On The Importance Of Event Detection For Asr

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 11|浏览116
暂无评分
摘要
The performance of modern large vocabulary continuous speech recognition (LVCSR) systems is heavily affected by segment boundaries, proper speaker identification of the segments, as well as removal of spurious data. We propose to use Long Short Term Memory (LSTM) recurrent neural networks to partition audio into speech segments as well as track speaker turns. Additionally, we train an LSTM to also identify music segments. We show that the accurate detection of events, along with removal of silence and music, using our LSTM yields a 9-10% relative improvement in ASR performance. Secondary processing by speaker clustering provides an additional boost in accuracy. Event detection accuracy of the LSTM approach is also described.
更多
查看译文
关键词
Event Detection,Diarization,Automatic Speech Recognition,Long Short Term Memory,Music Detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要