Prediction of Venous Thromboembolism in Diverse Populations Using Machine Learning and Structured Electronic Health Records

ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY(2024)

引用 0|浏览3
暂无评分
摘要
BACKGROUND:Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE.We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6.In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology.Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.BACKGROUND:Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE.We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6.In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology.Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.BACKGROUND:Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE.We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6.In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology.Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.BACKGROUND:Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE.We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6.In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology.Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.
更多
查看译文
关键词
machine learning,medical records,morbidity,risk assessment,thrombosis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要