Automatic {W}eb Page Categorization by Link and Context Analysis
msra(1999)
摘要
Assistance in retrieving documents on the World Wide Web
is provided either by search engines, through
keyword-based queries, or by catalogues, which organize
documents into hierarchical collections. Maintaining
catalogues manually is becoming increasingly difficult,
due to the sheer amount of material on the Web; it is thus
becoming necessary to resort to techniques for the
automatic classification of documents. Automatic
classification is traditionally performed by extracting
the information for representing a document (``indexing'')
from the document itself. The paper describes the novel
technique of categorization by context, which instead
extracts useful information for classifying a document
from the context where a URL referring to it appears. We
present the results of experimenting with Theseus, a
classifier that exploits this technique.
更多查看译文
关键词
search engine,web pages,content analysis,world wide web
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要