Usage meets link analysis: towards improving intranet and site specific search via usage statistics

Usage meets link analysis: towards improving intranet and site specific search via usage statistics(2005)

引用 23|浏览5
暂无评分
摘要
This thesis explores the possibility of incorporating usage statistics to improve ranking quality in site specific and intranet search engines. A number of usage based ranking approaches are introduced including a PageRank extension, Usage aware PageRank (UPR), an extension to HITS (UNITS), and a naive approach that uses the number of visits to pages as a quality measure. These methods are compared against each other and against two major link analysis approaches: PageRank and HITS. Weighting schemes that take into account the probability of visiting a page directly (by typing or via bookmarks), as well as the relative probability of following a particular link from a given page are explored. Both of these probabilities can be approximated from usage logs. Experimental results are carried out using a site specific search engine incorporating the above methods, using 6+ months of usage logs centered around the snapshot. The parameter space for UPR and UNITS are sampled to examine the effects of varying usage emphasis factors. Experiments suggest that one of the proposed methods, UPR is promising and has a number of desirable properties, generalizing PageRank and inheriting basic PageRank properties. It is also stable and flexible. Usage based signals such as UPR, can be especially useful in an intranet/site specific search setting, where documents tend to be poorly connected compared to the Web, but inherently, there is no or very little incentive for spamming.
更多
查看译文
关键词
usage log,PageRank extension,intranet search engine,varying usage emphasis factor,link analysis,basic PageRank property,site specific search engine,generalizing PageRank,aware PageRank,site specific search setting,usage statistic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要