Web Search Based on Micro Information Units

msra(2002)

引用 23|浏览45
暂无评分
摘要
Internet search is one of the most important applications of the Web. One shortcoming of existing search techniques is that they do not give due consideration to the micro-structures of a Web page. A Web page is often populated with a number of small information units, which we call micro information units (MIU). Each unit focuses on a specific topic and occupies a specific area of the page. During the search, if all the keywords in the user query occur in a single MIU of a page, the top ranking results returned by a search engine are generally relevant and useful. However, if the query words scatter at different MIUs in a page, the pages returned can be quite irrelevant. The reason for this is that although a page has information on individual MIUs, it may not have information on their intersections. In this paper, we propose a technique to solve this problem. At the off-line pre- processing stage, we segment each page to identify the MIUs in the page, and index the keywords of the page according to the MIUs in which they occur. In searching, our retrieval and ranking algorithm utilizes this additional information to return those most relevant pages. Experimental results show that this method is able to dramatically improve the search precision. processing. We show that the additional information on MIUs can be naturally integrated with inverted lists indexing commonly used by Web search engines. In on-line search, our retrieval and ranking algorithm makes use of this MIU information to sort the relevant pages. Due to seamless integration of MIUs with inverted lists, additional computation required in searching is minimum. The proposed technique is intended to be used as an advanced search option or technique for a search engine (which we also call the base search engine). That is, when the precision of the results returned by the base search engine is low, we can employ the proposed technique to re-rank the results. To evaluate the proposed technique, we use Google as the base search engine. Experimental results show that our method is able to improve Google's search precision dramatically.
更多
查看译文
关键词
indexation,web search engine,web pages,line search,search engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要