Lucene and Juru at TREC 2007: 1-Million Queries Track

TREC(2007)

引用 36|浏览64
暂无评分
摘要
Lucene is an increasingly popular open source search library. However, our experiments of search quality for TREC data and evaluations for out-of-the-box Lucene indicated inferior quality comparing to other systems participating in TREC. In this work we investigate the differences in measured search quality between Lucene and Juru, our home-brewed search engine, and show how Lucene scoring can be modified to improve its measured search quality for TREC. Our scoring modifications to Lucene were trained over the 150 topics of the tera-byte tracks. Evaluations of these mod- ifications with the new - sample based - 1-Million Queries Track measures - NEU-Map and �-Map - indicate the ro- bustness of the scoring modifications: modified Lucene per- forms well when compared to stock Lucene and when com- pared to other systems that participated in the 1-Million Queries Track this year, both for the training set of 150 queries and for the new measures. As such, this also sup- ports the robustness of the new measures tested in this track. This work reports our experiments and results and de- scribes the modifications involved - namely normalizing term frequencies, different choice of document length normaliza- tion, phrase expansion and proximity scoring.
更多
查看译文
关键词
term frequency,search engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要