Statistical language model based on a hierarchical approach: MCnv

INTERSPEECH(2001)

引用 23|浏览5
暂无评分
摘要
In this paper, we propose a new language model based on depen- dent word sequences organized in a multi-level hierarchy. We call this model MC , where is the maximum number of words in a sequence and is the maximum number of levels. The origi- nality of this model is its capacity to take into account depe ndent variable-length sequences for very large vocabularies. In order to discover the variable-length sequences and to build the hierarchy, we use a set of syntactic classes extracted from the French el- ementary grammatical classes. The MC model learns hierarchical word patterns and uses them to reevaluate and filter the n-bes t utter- ance hypotheses outputted by our speech recognizer MAUD. The model has been trained on a corpus of million words extracted from a French newspaper and uses a vocabulary of words. Tests have been conducted on sentences. Results achieved decrease in perplexity compared to an interpolated class tri- gram model. Rescoring the original n-best hypotheses resulted in an improvement of in accuracy.
更多
查看译文
关键词
language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要