Intelligent web monitoring - A hypertext mining-based approach

Manas A. Pathak, Vivek S Thakre

Journal of the Indian Institute of Science(2013)

引用 25|浏览1
暂无评分
摘要
The World Wide Web has become one of the principal sources of information since its inception. With large amount of content added and deleted, the amount of change in hypertextual data is massive. This rapidly changing nature of the WWW makes the task of tracking information intractable when done manually. In this paper we propose an approach for intelligently monitoring the website for changes, taking into consideration the user interests and ranking of these changes according to relevance. A prototype system WebMon based on this approach is presented. WebMon consists of basic components performing infrastructural activities such as crawlers and indexers. Also it takes as input keyword weights based on the user interests. It then represents the hypertextual data in the website in the form of a vector space model (VSM). Periodically this process is carried out to get the VSM representing the hypertextual data of the website at that instance of time. To monitor for changes, the data in VSMs at different instances of time is compared and the corresponding changes are ranked according to their relevance according to the user. A modified nearest neighbor algorithm (NN) is implemented for the same. To further improve the accuracy and self-adjustability of the relevance rankings, the system employs a modified supervised learning algorithm thereby taking into account the behavior of the user intelligently. The WebMon system has been tested extensively on many websites giving results as expected. In this paper we report some experimental results showing the effectiveness of the proposed approach.
更多
查看译文
关键词
supervised learning.,vector space model,relevance ranking,nearest neighbors,indexation,world wide web,nearest neighbor,supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要