Data quality monitoring in clinical and observational epidemiologic studies: the role of metadata and process information

Richter, Adrian, Schössow, Janka,Werner, André,Schauer, Birgit,Radke, Dörte, Henke, Jörg,Struckmann, Stephan, Schmidt, Carsten Oliver

GMS Medizinische Informatik, Biometrie und Epidemiologie(2019)

引用 0|浏览1
暂无评分
摘要
High data quality is fundamental for valid inferences in health research. Metadata, i.e. “data that describe other data”, are essential to implement data quality assessments but more guidance on which metadata to use is needed. Similarly, the selection and use of variables describing the measurement process should be exemplified to improve the design and conduct of observational health studies. This work provides a conceptual framework and overview of metadata and process information for systematic data quality reports based on implementations within the population-based cohort Study of Health in Pomerania (SHIP). In previous years, a prerequisite for automated data quality checks has been established by the augmentation of the data dictionary; the added information of up to 20 different characteristics for each variable is used for data quality assessments and triggers diverse data quality checks. Conceptually we distinguish static metadata, variable metadata, and process variables. Examples for static metadata are the expected probability distribution, plausibility limits, and the data type. Variable metadata may be reference limits of a laboratory marker. Information inherent to these metadata allows for the detection of data quality flaws by comparing observed with expected data characteristics. In contrast, process variables, such as the observer or device ID, also allow for the identification of sources of data quality issues. This is the case even if characteristics defined in metadata were not violated. Metadata and process variables can be used alone or in combination to implement a versatile and efficient data quality assessment. A comprehensive setup of metadata and process variables is an extensive task, particularly in studies involving large data collections. Nonetheless, the gain in transparency and efficacy of data curation and quality reporting after this setup is considerable.
更多
查看译文
关键词
data quality,metadata,process variables,data monitoring,health research,cohort studies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要