Discriminative duration modeling for speech recognition with segmental conditional random fields

Acoustics, Speech and Signal Processing(2011)

引用 8|浏览35
暂无评分
摘要
This paper describes a new approach to modeling duration for LVCSR using SCARF, a toolkit for speech recognition with segmental conditional random fields. We utilize SCARF's ability to integrate long-span, segment-level features to design and test duration models that help discriminate between correct and incorrect word hypotheses. We show that the duration distributions of correct and incorrect word hypotheses differ. Given a word hypothesis in the lattice and its duration, conditional length probabilities are integrated to the SCARF system as duration features. We evaluate three kinds of duration features on Broadcast News: word, pre- and post-pausal durations, and word span confusions. Adding the duration features to SCARF results in an up to 0.3% improvement over a state of-the-art discriminatively trained baseline of 15.3% WER on a Broadcast News task.
更多
查看译文
关键词
speech recognition,LVCSR,SCARF system,WER,discriminative duration modeling,post-pausal durations,segmental conditional random fields,speech recognition,automatic speech recognition,duration modeling,segmental conditional random fields
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要