A comparison study on creating simulated patient data for individuals suffering from chronic coronary disorders

2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC(2023)

引用 0|浏览8
暂无评分
摘要
An emerging area in data science that has lately gained attention is the virtual population (VP) and synthetic data generation. This field has the potential to significantly affect the healthcare industry by providing a means to augment clinical research databases that have a shortage of subjects. The current study provides a comparative analysis of five distinct approaches for creating virtual data populations from real patient data. The data set utilized for the current analyses involved clinical data collected among patients scheduled for elective coronary artery bypass graft surgery (CABG). To that end, the five computational techniques employed to augment the given dataset were: (i) Tabular Preset, (ii) Gaussian Copula Model (iii) Generative Adversarial Network based (GAN) Deep Learning data synthesizer (CTGAN), (iv) a variation of the CTGAN Model (Copula GAN), and (v) VAE-based Deep Learning data synthesizer (TVAE). The performance of these techniques was assessed against their effectiveness in producing high-quality virtual data. For this purpose, dataset correlation matrices, cosine similarity distance, density histograms, and kernel density estimation are employed to perform a comparative analysis of each attribute and the respective synthetic equivalent. Our findings demonstrate that Gaussian Copula Model prevails in creating virtual data with consistent distributions (Kolmogorov-Smirnov (KS) and Chi-Squared (CS) tests equal to 0.9 and 0.98, respectively) and correlation patterns (average cosine similarity equals to 0.95).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要