A Toolset for Supporting Evolution and Preservation of Linked Data: the DIACHRON approach

Panagiotis Hasapis,Danae Vergeti, Aggelos Liapis,Antonis Ramfos,Giorgos Flouris, Kostas Stefanidis,Ioannis Chrysakis,Yannis Roussakis,George Papastefanatos,Yannis Stavrakas, Christos Pateritsas,Theodora Galani, Stratis D. Viglas

semanticscholar(2015)

引用 0|浏览0
暂无评分
摘要
Over the course of the last few years, there has been a vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data, published on the emerging Data Web that has been created for open access. Open Data is expected to play a catalyst role in the way structured information is exploited in the large scale. This offers a great potential for building innovative products and services that create new value from already collected data. Open data published according to the Linked Data Paradigm is essentially transforming the Web from a document publishing-only environment, into a knowledge ecosystem where users have become active data aggregators and generators themselves. A traditional view of digitally preserving them by pickling them and locking them away for future use, like groceries, would conflict with their evolution. There are a number of approaches and frameworks, such as the LOD2 stack, that manage a full life-cycle of the Data Web. More specifically, these techniques are expected to tackle major issues such as the synchronisation problem (how can we monitor changes), the curation problem (how can data imperfections be repaired), the appraisal problem (how can we assess the quality of a dataset), the citation problem (how can we cite a particular version of a linked dataset), the archiving problem (how can we retrieve the most recent or a particular version of a dataset), and the sustainability problem (how can we spread preservation ensuring long-term access). In this paper we describe DIACHRON, a unified semantic platform for supporting the evolution and preservation of Linked Dataset. We describe modules that tackle the previously mentioned issues. With regard to the synchronization problem, our approach allows the identification and analysis of the evolution of a dataset, in an efficient, user-friendly and customizable manner. The proposed solution allows the execution of queries spanning multiple versions, as well as queries related to the evolution itself (rather than just the data). For the citation problem, we describe a rule-based mechanism for specifying, extracting, and assigning citable persistent identifiers to diachronic resources. A sequential process is implemented to efficiently assess a dataset’s quality, providing the user with the necessary quality metadata and quality problem report as a bonus, in order to keep track of the appraisal problem. With regard to the archiving problem, we have designed and developed a conceptual model that captures both structural and semantic aspects of evolving data, thus enabling evolution management at different granularity levels. Based on this, we have implemented a query language as an extension of SPARQL that inherently tackles querying evolving entities and their changes across time. Supporting this platform we provide three real-life use-cases; a business use-case, a life science use-case, and an Open Data use-case.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要