An Approach to Page Ranking Based on Discourse Structures

Journal of communications software and systems(2016)

引用 0|浏览0
暂无评分
摘要
World Wide Web (WWW) which is predominant source for Information Retrieval today (IR) is essentially a set of hyperlinked documents. A web page containing more number of related hyperlinks satisfy the user needs in a single page. The IR systems should give high priority to such web pages. While assigning a rank for a web page, existing web mining techniques such as Hypertext Induced Topic Selection (HITS) and Page Ranking algorithms focus on the number of in links and out links present in the web page. Instead of just relying on the number of links present in the web page, the discovery of semantic relations between the web page and the hyperlinks present in the web page can improve the quality of the IR systems. The Rhetorical Structure Theory (RST) is widely used to find the semantic relations between text fragments by analysing the discourse structure of a text. In this paper, we propose a novel approach to find the semantic relation between a web page and the links present in the web page using RST. The proposed approach uses RST based discourse relations to find the relation between a web page and the hyperlinks present in the web page. We have implemented and evaluated our approach on an IR system using 500 Tamil language and 50 English tourism domain specific web pages. A comparison between the proposed approach and an existing page ranking algorithm has also been done.
更多
查看译文
关键词
Discourse structure,Link Analysis and Rhetorical Structure Theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要