An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale

2022 IEEE International Conference on Web Services (ICWS)(2022)

引用 0|浏览17
暂无评分
摘要
The number of web APIs is growing rapidly. API adoption is increasing across all industries with executives prioritizing investments in the API economy. Each API provider offers API documentation which includes complex descriptions. In order to collect and understand the applications and operations of diverse APIs, software engineers read lengthy and complicated API documentations. Understanding the variety of API documentations is a labor intensive and error-prone process. In this paper, we introduce a data-centric web crawler service to collect, analyze, and construct a large corpus of API documentations. The generated API Corpus can be used in machine programming (i.e., code generation, code search). The proposed API web-crawler intelligently harvests more than 2.8M API documentation pages where it uses a machine-learning-based approach with an accuracy of 91.32% to select only web API pages (REST). We also conducted an extensive and end-to-end real-world evaluation, where the proposed API web-crawler not only collects a sheer number of API pages, but also successfully validates 1,222 APIs out of 1,521 target APIs with a success rate of 80.34%.
更多
查看译文
关键词
Web API,Web Crawler,machine-learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要