Crawler by Contextual Inference

SN Comput. Sci.(2021)

引用 1|浏览2
暂无评分
摘要
With a million new pages getting added every single day, the already gigantic web is growing exponentially. While it challenges the search engine and traditional information retrieval methods in producing the relevant results, so does the crawler, which does the background job of traversing the web with hyperlink structure to obtain the web snapshot. The traditional crawlers put forth the challenges of maintaining an appropriate traversal data structure and tracking the already visited pages. Contemporary applications require context and domain-specific crawlers that harvest the right set of pages and data. A focused crawler needs to have domain-specific evaluation parameters to evaluate and crawl the right set of pages based on relevance. This paper proposes a novel model—Crawler by Contextual Inference to achieve the said objectives using semantic similarity, paradigmatic similarity, and inference rules. The proposed methodology prioritizes the links based on the number of new rules built or discovered using a similarity matrix to generate inference rules. The model proposes an efficient data structure—an intelligent queue, which holds the links on a priority basis. The paper also presents the results in comparison with the traditional crawler, crawler by inference, and our model—crawler by contextual inference. The model promises to produce better results by avoiding the crawl of irrelevant pages.
更多
查看译文
关键词
Crawlers,Contextual,Inferences,Web
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要