Link-Rot in Web-Sourced Multimedia Datasets.

MMM (1)(2023)

引用 0|浏览18
暂无评分
摘要
The Web is increasingly used as a source for content of datasets of various types, especially multimedia content. These datasets are then often distributed as a collection of URLs, pointing to the original sources of the elements. As these sources go offline over time, the datasets experience decay in the form of link-rot. In this paper, we analyze 24 Web-sourced datasets with a combined total of over 270 million URLs and find that over 20% of the content is no longer available. We discuss the adverse effects of this decay on the reproducibility of work based on such data and make some recommendations on how they could be mediated in the future.
更多
查看译文
关键词
Link rot, Dataset rot, Online datasets, Reproducibility
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要