Evaluation of Statistical Approaches in Developing a Predictive Model of Severe COVID-19 during Early Phase of Pandemic with Limited Data Resources

TOHOKU JOURNAL OF EXPERIMENTAL MEDICINE(2024)

引用 0|浏览3
暂无评分
摘要
As evidence of risk factors for severe cases of coronavirus disease 2019 (COVID-19) was uncertain in early phases of the pandemic, the development of an efficient predictive model for severe cases to triage high-risk individuals represented an urgent yet challenging issue. It is crucial to select appropriate statistical models when available data and evidence are limited. This study was conducted to assess the accuracy of different statistical models in predicting severe cases using demographic data from patients with COVID-19 prior to the emergence of consequential variants. We analyzed data from 929 consecutive patients diagnosed with COVID-19 prior to March 2021, including their age, sex, body mass index, and past medical histories, and compared areas under the receiver operating characteristic curve (ROC AUC) between different statistical models. The random forest (RF) model, deep learning (DL) models with not too many neurons, and naive Bayes model exhibited AUC measures of > 0.70 with the validation datasets. The naive Bayes model performed the best with the AUC measures of > 0.80. The accuracies in RF were more robust with narrower distribution of AUC measures compared to those in DL. The benefit of performing feature selection with a training dataset before building models was seen in some models, but not in all models. In summary, the naive Bayes and RF models exhibited ideal predictive performance even with limited available data. The benefit of performing feature selection before building models with limited data resources depended on machine learning methods and parameters.
更多
查看译文
关键词
coronavirus disease 2019 (COVID-19),deep learning,naive Bayes,neural network,random forest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要