Publication I

semanticscholar(2012)

引用 0|浏览2
暂无评分
摘要
This paper presents a baseline spoken document retrieval system in Finnish. Due to its agglutinative structure, Finnish speech can not be adequately transcribed using the standard large vocabulary continuous speech recognition approaches. The de nition of a suf cient lexicon and the training of the statistical languagemodels are dif cult, because the words appear transformed by many in ections and compounds. In this work we apply a recently developed unlimited vocabulary speech recognition system that allows the use of n-gram language models based on morpheme-like subword units discovered in an unsupervised manner. In addition to word-based indexing, we also propose an indexing based on the subword units provided directly by our speech recognizer. In an initial evaluation of newsreading in Finnish, we obtained a fairly low recognition error rate and average document retrieval precisions close to that from human reference transcripts.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要