Evaluation of a Stack Decoder on a Japanese Newspaper Dictation Task 3

mag(1996)

引用 23|浏览5
暂无评分
摘要
This paper describes the evaluation of the !V$N$>$_!W stack decoder for LVCSR on a 5000 word Japanese newspaper dictation task [3]. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group, it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total. 1. A ONE-PASS STACK DECODER A time-asynchronous stack decoder has proven to be an e cient approach to decoding for speech recognition. The decoder described here is in its basic implementation similar to the approach described in [1] and [2]. It can handle arbitrary order N-grams and crossword models of any order in one left-to-right pass, which was found especially important for the recognition of Japanese. These two important modules of the decoder are here described in more detail, because they made to a large extent a timeand memorye cient search possible.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要