Detection and Recovery of OOVs for Improved English Broadcast News Captioning

INTERSPEECH(2019)

引用 11|浏览193
暂无评分
摘要
In this paper we present a study on building various deep neural network-based speech recognition systems for automatic caption generation that can deal with out-of-vocabulary (OOV) words. We develop several kinds of systems using various acoustic (hybrid, CTC, attention-based neural networks) and language modeling (n-gram and RNN-based neural networks) techniques on broadcast news. We discuss various limitations that the proposed systems have and introduce methods to effectively use them to detect OOVs. For automatic OOV recovery, we compare the use of different kinds of phonetic and graphemic sub-word units, that can be synthesized into word outputs. On an experimental three hour broadcast news test set with a 4% OOV rate, the proposed CTC and attention-based systems are capable of reliably detecting OOVs much better (0.52 F-score) than a traditional hybrid baseline system (0.21 F-score). These improved detection gains translate further to better WER performance. With reference to a non-OOV oracle baseline, the proposed systems at just 12% relative (1.4% absolute) loss in word error rate (WER), perform significantly better than the traditional hybrid system (with close to 50% relative loss), by recovering OOVs using their sub-word outputs.
更多
查看译文
关键词
speech recognition, out-of-vocabulary word detection and recovery, end-to-end systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要