DynWebStats: A Framework for Determining Dynamic and Up-to-date Web Indicators.

WebMedia(2016)

引用 0|浏览60
暂无评分
摘要
It has been broadly discussed over the last years about the growth and popularity of the Internet and, more specifically, about the World Wide Web and its services and applications. Despite being common sense, acquiring indicators about this growth and characteristics of the whole Web, or event parts of it, is a big challenge, which can be explained by some factors: (1) the constant and dynamical evolution of the Web in many dimensions, that is, any analysis becomes obsolete instantly as soon as it's ready; (2) the great volume of data that is necessary to generate indicators, which is usually disruptive in terms of bandwidth and storage. There are also problems related to ethics and network viability of the crawl; and (3) the coverage and newness to generate indicators, whether indicators about domains or Web pages. This paper presents a new methodology for generating dynamic Web indicators, which consider Web pages changes, both in terms of its modifications and its creation or deletion. This methodology provides a rational crawling and offers a measure of the quality of the indicators. In order to validate it, we run a simulation that uses a dataset with 8.690 Web pages that were downloaded daily for 134 days. The results show that it's possible to crawl a greater universe of Web pages and still keep indicators between acceptable levels of confidence, turning it possible to have a snapshot of this universe as close to reality as possible.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要