A System Framework For Efficiently Recognizing Web Crawlers

2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI)(2018)

引用 2|浏览15
暂无评分
摘要
In recent years, web crawlers are widely used for collecting data from the Internet. However, they cause many problems including QoS degrading of normal visits, inaccuracy of data analysis, and business concerns about the data in the websites. It is highly demanded that there is a systematical way to recognize the web crawlers. In this paper, we propose a system framework to recognize the web crawlers and take corresponding actions for handle them. The access requests of a website are recorded by the logs, and then a machine learning approach is used to distinguish the web crawlers from normal users based on the logs. Detail components and procedures of the system framework are illustrated. Based on the system framework and approach, we implement an anti-crawler system. A twenty-days experiment show that the system can recognize most of the requests from web crawlers and have few miss detections of accesses from humans as those from web crawlers.
更多
查看译文
关键词
Web Crawler, System Framework, Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要