Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review

SWJ Nijman, AM Leeuwenberg, I Beekers, I Verkouter,JJL Jacobs, ML Bots, FW Asselbergs,KGM Moons, TPA Debray

Journal of Clinical Epidemiology(2022)

引用 55|浏览17
暂无评分
摘要
•Prediction model studies that adopt machine learning (ML) methods rarely report the presence and handling of missing data.•Although many types of machine learning methods offer built-in capabilities for handling missing values, these strategies are rarely used. Instead, most ML-based prediction model studies resort to complete case analysis or mean imputation.•Missing data are often poorly handled and reported, even when adopting advanced machine learning methods for which advanced imputation procedures are available.•The handling and reporting of missing data in prediction model studies should be improved. A general recommendation to avoid bias is to use multiple imputation. It is also possible to consider machine learning methods with built-in capabilities for handling missing data (e.g., decision trees with surrogate splits, use of pattern submodels, or incorporation of autoencoders).•Authors should take note of and appreciate the existing reporting guidelines (notably, TRIPOD and STROBE) when publishing ML-based prediction model studies. These guidelines offer a minimal set of reporting items that help to improve the interpretation and reproducibility of research findings.
更多
查看译文
关键词
Missing data,Machine learning,prediction,reporting,literature review
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要