Web directory construction using lexical chains

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS(2005)

引用 19|浏览0
暂无评分
摘要
Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the creation of a Web Directory. At a high level, our method takes as input a subject hierarchy and a collection of pages. We first leverage a variety of lexical resources from the Natural Language Processing community to enrich our hierarchy. After that, we process the pages and identify sequences of important terms, which are referred to as lexical chains. Finally, we use the lexical chains in order to decide where in the enriched subject hierarchy we should assign every page. Our experimental results with real Web data show that our method is quite promising into assisting humans during page categorization.
更多
查看译文
关键词
important page,web directory construction,real web data,lexical resource,web directories,enriched subject hierarchy,important term,lexical chain,page categorization,subject hierarchy,web directory,natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要