Experiments in Spoken Document Retrieval at CMU

TREC(1998)

引用 77|浏览45
暂无评分
摘要
We describe our submission to the TREC-6 Spoken Document Retrieval (SDR) track and the speech recognition and the information retrieval engines. We present SDR evaluation results and a brief analysis. A few developments and experiments are also described in detail including: • Vocabulary size experiments, which assess the effect of words missing from the speech recognition vocabulary. For our 51,000-word vocabulary the effect was minimal. • Speech recognition using a stemmed language model, where the model statistics of words containing the same root are combined. Stemmed language models did not improve speech recognition or information retrieval. • Merging the IBM and CMU speech recognition data. Combining the results of two independent recognition systems slightly boosted information retrieval results. • Confidence annotations that estimate of the correctness of each recognized word. Confidence annotations did not appear to improve retrieval. • N-best lists where the top recognizer hypotheses are used for information retrieval. Using the top 50 hypotheses dramatically improved performance in the test set.
更多
查看译文
关键词
language,language model,information retrieval,engines,reliability,statistics,hypotheses,weight function,speech recognition,error probability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要