Theme-Based Spider for Academic Paper.

Communications in Computer and Information Science(2017)

引用 0|浏览24
暂无评分
摘要
Nowadays contents of the web multiply everyday. However, for particular company or individual, some kind of information has higher priority. For example, among so much information on the internet, web pages containing academic papers are definitely more attractive to a researcher. And the problem lies in how to find that kind of data. Therefore we design a spider that targets only on online academic papers. Besides reserving three major parts of a traditional spider, we make some modifications on Filter and Parser so that our spider is competent enough to accomplish the mission. And the essential mechanism of recognizing and extracting expected pages primarily lies on keyword-matching and Finite State Machine Theory. After roaming on two web sites, the spider successfully collects desirable information. We can safely see from the result that in future by optimization and modification this theme-based spider may work more efficiently or even expands to other fields of interest.
更多
查看译文
关键词
Theme-based,Spider,Paper
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要