Improving Language Models For Asr Using Translated In-Domain Data

ICASSP(2012)

引用 3|浏览86
暂无评分
摘要
Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off-and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.
更多
查看译文
关键词
Low Resource ASR,Language Modeling,Machine Translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要