The Status of data management practices throughout the Data Life Cycle: a Mixed-Method Study across MIRACUM Data Integration Centers

Kerstin Gierend, Sherry Freiesleben,Dennis Kadioglu,Fabian Siegel,Thomas Ganslandt,Dagmar Waltemath

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Background In the context of the Medical Informatics Initiative funded by the German government, medical data integration centers have implemented complex data flows to load routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Additionally, insufficient knowledge about these processes can lead to validity risks and weaken the quality of the extracted data. The need to collect provenance data during the data life cycle is undisputed, but there is a great lack of clarity on the status. Our study examines the current provenance tracking practices throughout the data lifecycle within the MIRACUM consortium. We outline the current data management maturity status and present recommendations to enable a trustful dissemination and re-use of patient data. Methods Our study design is based on a mixed-method study. We conducted semi-structured interviews with stakeholders from ten data integration centers between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM data integration centers, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. Results From a provenance perspective, our study provides insights into the data management practices concerning data extraction, transformation, storage, and provision. We identified several traceability and reproducibility issues that can be partially explained with a lack of contextual information within non-harmonized workflow steps, unclear responsibilities, missing or incomplete data elements and incomplete computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. Conclusions In this study, we present insights on provenance practices at the data integration centers. The data management maturity framework supports the production and dissemination of accurate and provenance enriched data for their second use. Furthermore, our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as a key factor for quality and FAIR sustained health and research data.
更多
查看译文
关键词
miracum data integration centers,data management practices,data life cycle,mixed-method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要