Exploring the tradeoff between data privacy and utility with a clinical data analysis use case: a case report

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Securing adequate data privacy is critical for the productive utilization of data. De-identification, involving masking or replacing specific values in a dataset, could damage the dataset’s utility. However, finding a reasonable balance between data privacy and utility is not straightforward. Nonetheless, few studies investigated how data de-identification efforts affect data analysis results. This study aimed to demonstrate the effect of different de-identification methods on a dataset’s utility with a clinical analytic use case and assess the feasibility of finding a workable tradeoff between data privacy and utility. Predictive modeling of emergency department length of stay was used as a data analysis use case. A logistic regression model was developed with 1155 patient cases extracted from a clinical data warehouse of an academic medical center located in Seoul, South Korea. Nineteen de-identified datasets were generated based on various de-identification configurations using ARX. The variable distributions and prediction results were compared between the de-identified datasets and the original dataset to observe the association between data privacy and utility, and to determine whether it is feasible to identify a viable tradeoff between the two. The findings of this study demonstrated that securing data privacy resulted in some loss of data utility. Due to the complexity of the process of ensuring data privacy while maintaining utility understanding the purpose of data use may be required. Including the data user in the data de-identification process may be helpful in the effort to find an acceptable tradeoff between data privacy and utility.
更多
查看译文
关键词
data privacy,case report
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要