An Empirical Study of the Usage of Checksums for Web Downloads

WWW 2023(2023)

引用 0|浏览7
暂无评分
摘要
Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we elucidate the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200000 most popular domains of the Web, we first crawled a dataset of  8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used, and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files. Finally, we complement our analysis with a survey of the webmasters of the considered webpages (N = 26), thus shedding light on the reasons behind the checksum-related choices they make.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要