Identification of Patients at Risk for Pancreatic Cancer in a 3-Year Timeframe Based on Machine Learning Algorithms in the Electronic Health Record

AMERICAN JOURNAL OF GASTROENTEROLOGY(2021)

引用 0|浏览11
暂无评分
摘要
Importance Early detection of pancreatic cancer (PC) remains challenging largely due to the low population incidence and few known risk factors. However, screening in at-risk populations and detection of early cancer has the potential to significantly alter survival. Objective In this study, we aim to develop a predictive model to identify patients at risk for developing new-onset PC at two and a half to three year time frame Data Sources We used the Electronic Health Records (EHR) of a large medical system from 2000 to 2021 (N=537,410). The EHR data analyzed in this work consists of patients’ demographic information, diagnosis records, and lab values, which are used to identify patients who were diagnosed with pancreatic cancer and the risk factors used in the machine learning algorithm for prediction. Results We identified 73 risk factors of pancreatic cancer with the Phenome-wide Association Study (PheWAS) on a matched case-control cohort. Based on them, we built a large-scale machine learning algorithm based on EHR. A temporally stratified validation based on patients not included in any stage of the training of the model was performed. This model showed an AUROC at 0.742 [0.727, 0.757] which was similar in both the general population and in a subset of the population who has had prior cross-sectional imaging. The prevalence of pancreatic cancer in those in the top 5 percentile of the risk score was 6 folds higher than the general population. Conclusion Our model leverages data extracted from a 6-month window of time in the electronic health record to identify patients at nearly 6-fold higher than baseline risk of developing pancreatic cancer 2.5 to 3 years from evaluation. This approach offers an opportunity to define an enriched population entirely based on static data, where current screening may be recommended. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study did not receive any funding ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Institution of New York University-Langone Health waived IRB approval for this data-set as it was used for Quality Improvement purposes. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Due to protection of private patient information, the data used in the study will not be made publicly available on the basis of individual collaborations. If an institution-wide agreement between NYU Langone and the requesting institutions is generated, we will share the agreed data elements under data use agreement terms.
更多
查看译文
关键词
pancreatic cancer,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要