Bridging the Terminology Gap in Web Archive Search

WebDB(2009)

引用 54|浏览12
暂无评分
摘要
ABSTRACT Web archives play an important role in preserving our cul- tural heritage for future generations. When searching them, a serious problem arises from the fact that terminology evolves constantly. Since today’s users formulate queries using cur- rent terminology, old but relevant documents are often not retrieved. The querysaint petersburg museum, for instance, does not retrieve documents,from the 1970s about museums in Leningrad (the former name,of Saint Petersburg). We address this problem by determining query reformu- lations that paraphrase the user’s information need using terminology prevalent in the past. A measure of across-time semantic similarity that assesses the degree of relatedness between two terms when used at di!erent times is proposed. Using this measure as a crucial building block, we propose a novel query reformulation technique based on a hidden Markov model (HMM). Experiments on twenty years worth of New York Times articles demonstrate the usefulness and e"ciency of our approach. Categories and Subject Descriptors [H.3.3]Information Search and RetrievalQuery formu- lation, Search process General Terms
更多
查看译文
关键词
semantic similarity,information need,hidden markov model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要