Using Metadata to Enhance Web Information Gathering

Lecture Notes in Computer ScienceThe World Wide Web and Databases(2001)

引用 15|浏览20
暂无评分
摘要
With the web at close to a billion pages and growing at an exponential rate, we are faced with the issue of rating pages in terms of quality and trust. In this situation, what other pages say about a web page can be as important as what the page says about itself. The cumulative knowledge of these types of recommendations (or the lack thereof) can be objective enough to help a user or robot program to decide whether or not to pursue a web document. In addition, these annotations or metadata can be used by a web robot program to derive summary information about web documents that are written in a language that the robot does not understand. We use this idea to drive a web information gathering system that forms the core of a topic-specific search engine. In this paper, we describe how our system uses metadata about the hyperlinks to guide itself to crawl the web. It sifts through useful information related to a particular topic to eliminate the traversal of links that may not be of interest. Thus, the guided crawling system stays focused on the target topic. It builds a rich repository of link information that includes metadata. This repository ultimately serves a search engine.
更多
查看译文
关键词
Resource Description Framework, Relevance Score, Anchor Text, Relevance Weighting, Target Topic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要