Developing a Machine Learning-Based Prediction Model for Diabetes Duration Using Information from Electronic Health Records

DIABETES(2023)

引用 0|浏览8
暂无评分
摘要
Diabetes duration is key information for epidemiologic studies but is not routinely collected in real-world data, such as claims and electronic health records (EHRs). This study aimed to build a predictive model for diabetes duration to nourish future research. Using data from the National Health and Nutrition Examination Survey (2009 to 2018), we identified individuals with self-reported diabetes and extracted information routinely collected in EHR. Predictors included demographics (e.g., age, sex), biomarkers (e.g., HbA1c, systolic blood pressure), diabetes-related comorbidities (e.g., retinopathy, end-stage renal disease), and glucose-lowering therapy (e.g., insulin, metformin). We used diabetes duration (in years) as the outcome. We compared the ordinary least square (OLS) model, least absolute shrinkage and selection operator (LASSO) regression, random forest, and extreme gradient boosting (XGBoost) models, using 10-fold cross-validation for tuning hyperparameters. A total of 3,267 survey participants were included, with a median diabetes duration of 9 years (Q1: 4 years, Q3: 16 years). The LASSO regression achieved the best performance (Root Mean Square Error [RMSE]: 7.62, Mean Absolute Error [MAE]: 5.74, Average Error [AE]: 0.53), followed by random forest (RMSE: 7.63, MAE: 5.74, AE: 0.47), XGBoost (RMSE: 7.63, MAE: 5.76, AE: 0.55), and OLS model (RMSE: 7.64, MAE: 5.76, AE: 0.59). The random forest algorithm identified age, insulin therapy, metformin monotherapy, retinopathy, and HbA1c as the predominating factors associated with diabetes duration. The prediction is more accurate if the diabetes duration is: 1) <10 years (RMSE:4.43, MAE: 3.66, AE: 0.10); 2) 10 to 20 years (RMSE: 5.71, MAE: 4.72, AE: 0.79). Our model could properly predict the diabetes duration using information available in EHR data. Model performance improved when applied to individuals living with diabetes for shorter than 20 years. Disclosure D.Guan: None. T.Jiao: None. H.Shao: Consultant; Lilly Diabetes. P.Li: None. V.Fonseca: Consultant; Abbott, Corcept Therapeutics, Eli Lilly and Company, Other Relationship; BRAVO4HEALTH, LLC, Research Support; Fractyl Health, Inc., Stock/Shareholder; Amgen Inc. L.Shi: None. M.K.Ali: Advisory Panel; Bayer Inc., Eli Lilly and Company, Research Support; Merck & Co., Inc. J.Varghese: None. R.M.Carrillo-larco: None. M.Rouhizadeh: None. A.G.Winterstein: Consultant; Bayer Inc., Genentech, Inc., Ipsen Biopharmaceuticals, Inc., Research Support; Merck Sharp & Dohme Corp.
更多
查看译文
关键词
diabetes duration,machine-learning–based prediction model,machine-learning–based machine-learning–based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要