On Compressing N-Gram Language Models

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference(2007)

引用 5|浏览22
暂无评分
摘要
In large-vocabulary speech recognition systems, the major part of memory resources is typically consumed by a large n-gram language model. Representing the language model compactly is important in recognition systems targeted for small devices with limited memory resources. This paper extends the compressed language model structure proposed earlier by Whittaker and Raj. By separating n-grams that are prefixes to longer n-grams, redundant information can be omitted. Experiments on English 4-gram models and Finnish 6-gram models show that extended structure can achieve up to 30% lossless memory reductions when compared to baseline structure of Whittaker and Raj.
更多
查看译文
关键词
data compression,natural language processing,speech coding,speech recognition,English 4-gram models,Finnish 6-gram models,compressing n-gram language models,large-vocabulary speech recognition systems,Data compression,Data structures,Modeling,Natural languages,Speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要