FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project

Mostafa Hadizadeh, Christof Lorenz,Sabine Barthlott, Romy Fösig, Katharina Loewe,Corinna Rebmann, Benjamin Ertl,Robert Ulrich,Felix Bach

crossref(2024)

引用 0|浏览0
暂无评分
摘要
In the rapidly advancing domain of environmental research, the deployment of a comprehensive, state-of-the-art Research Data Management (RDM) framework is increasingly pivotal.  Such a framework is key to ensure FAIR data, laying the groundwork for transparent and reproducible earth system sciences. Today, datasets associated with research articles are commonly published via prominent data repositories like Pangaea or Zenodo. Conversely, data used in actual day-to-day research and inter-institutional projects tends to be shared through basic cloud storage solutions or, even worse, via email. This practice, however, often conflicts with the FAIR principles, as much of this data ends up in private, restricted systems and local storage, limiting its broader accessibility and use. In response to this challenge, our research project Cat4KIT aims to establish a cross-institutional catalog and Research Data Management framework. The Cat4KIT framework is, hence, an important building block towards the FAIRification of environmental data. It not only streamlines the process of ensuring availability and accessibility of large-scale environmental datasets but also significantly enhances their value for interdisciplinary research and informed decision-making in environmental policy. The Cat4KIT system comprises four essential elements: data service provision, meta(data) harvesting, catalogue service, and user-friendly data presentation. The data service provision module is tailored to facilitate access to data within typical storage systems by using well-defined and standardized community interfaces via tools like the Thredds data server, Intake Catalogues, and the OGC SensorThings API. By this, we ensure seamless data retrieval and management for typical use-casers in environmental sciences. (Meta)data harvesting via our so-called DS2STAC-package entails collecting metadata from various data services, followed by creating STAC-metadata and integrating it into our STAC-API-based catalog service. This catalog service module synergizes diverse datasets into a cohesive, searchable spatial catalog, enhancing data discoverability and utility via our Cat4KIT UI. Finally, our framework's data portal is tailored to elevate data accessibility and comprehensibility for a wide audience, including researchers, enabling them to efficiently search, filter, and navigate through data from decentralized research data infrastructures. One notable characteristic of Cat4KIT is its dependence on open-source solutions and strict adherence to community standards. This guarantees not just the framework's ability to function well with current data systems but also its simple adaption and expansion to meet future needs. Our presentation demonstrates the technical structure of Cat4KIT, examining the development and integration of each module to adhere to the FAIR principles. Additionally, it showcases examples to illustrate the practical use of the framework in real-life situations, emphasizing its efficacy in enhancing data management practices within KIT and its potential relevance in other research organizations.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要