Data Farming to Table: Combined Use of a Learning Health System Infrastructure, Statistical Profiling, and Artificial Intelligence for Automating Toxicity and 3-year Survival for Quantified Predictive Feature Discovery from Real-World Data for Patients Having Head and Neck Cancers

medRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览20
暂无评分
摘要
Introduction Clinicians iteratively adjust treatment approaches to improve outcomes but to date, automatable approaches for continuous learning of risk factors as these adjustments are made are lacking. We combined a large-scale comprehensive real-world Learning Health System infrastructure (LHSI), with automated statistical profiling, visualization, and artificial intelligence (AI) approach to test evidence-based discovery of clinical factors for three use cases: dysphagia, xerostomia, and 3-year survival for head and neck cancer patients. Our hypothesis was that the combination would enable automated discovery of prognostic features generating testable insights. Methods Records for 964 patients treated at a single instiution for head and neck cancers with conventional fractionation between 2017 and 2022 were used. Combined information on demographics, diagnosis and staging, social determinants of health measures, chemotherapy, radiation therapy dose volume histogram curves, and treatment details, laboratory values, and outcomes from the LHSI to winnow evidence for 485 candidate prognostic features. Univariate statistical profiling using benchmark resampling to detail confidence intervals for thresholds and metrics: area under the curve (AUC), sensitivity (SN), specificity (SP), F1, diagnostic odds ratio (DOR), p values for Wilcoxon Rank Sum (WRS), Kolmogorov-Smirnov (KS), and logistic fits of distributions detailed predictive evidence of individual features. Statistical profiling was used to benchmark, parsimonious XGBoost models were constructed with 10-fold cross validation using training (70%), validation (10%), and test (20%) sets. Probabilistic models utilizing statistical profiling logistic fits of distributions were used to benchmark XGBoost models. Results Automated standardized analysis identified novel features and clinical thresholds. Validity of automated findings were affirmed with supporting literature benchmarks. Average incidence of dysphagia ≥grade 3 within 1 year of treatment was low (11%). Xerostomia ≥ grade 2 (39% to 16%) and survival ≤ 3 years decreased (25% to 15%) over the time range. Standard planning constraints used limited contribution of those features:: Musc\_Constrict\_S: Mean[Gy] < 50, Glnd\_Submand\_High: Mean[Gy] ≤ 30, Glnd\_Submand\_Low: Mean[Gy] ≤ 10, Parotid\_High: Mean[Gy] ≤ 24, Parotid\_Low: Mean[Gy] ≤ 10 Additional prognostic features identified for dysphagia included Glnd\_Submand\_High:D1%[Gy] ≥ 71.1, Glnd\_Submand\_Low:D4%[Gy] ≥ 55.1, Musc\_Constric\_S:D10%[Gy] ≥ 56.5, GTV\_Low:Mean[Gy] ≥ 71.3. Strongest grade 2 xerostomia feature was Glnd\_Submand\_Low: D15%[Gy] ≥ 45.2 with a logistic model quantifying a gradual rather than an abrupt increase in probability 13.5 + 0.18 (x-41.0 Gy). Strongest prognostic factors for lower likelihood of death by 3 years were GTV\_High: Volume[cc] ≤ 21.1, GTV_Low: Volume[cc] ≤ 57.5, Baseline Neutrophil-Lymphocyte Ratio (NLR) ≤ 5.6, Monocyte-Lymphocyte Ratio (MLR) ≤0.56, Platelet-Lymphocyte ratio (PLR) ≤ 202.5. All predictors had WRS and KS p values < 0.02. Statistical profiling enabled detailing gains of XGBoost models with respect to individual features. Time period reductions in distribution of GTV volumes correlated with reductions in death by 3 years. Discussion Confirming our hypothesis, automated, standardized statistical profiling of a set of statistical metrics and visualizations supported detailing predictive strength and confidence intervals of individual features, benchmarking of subsequent AI models, and clinical assessment. Association of high dose values to submandibular gland volumes, highlighted relevance as surrogate measures for proximal un-contoured muscles including digastric muscles. Higher values of PLR, NLR, and MLR were associated with lower survival rates. Combined use of Learning Health System Infrastructure, Statistical Profiling and Artificial Intelligence provided a basis for faster, more efficient evidence-based continuous learning of risk factors and development of clinical trial testable hypothesis. Benchmarking AI models with simple probabilistic models provided a means of understanding when results are driven by general areas of overall risk vs. more complex interactions. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study did not receive any funding ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: University of Michigan, Michigan Medicine Institutional Review Board approval waived for use of retrospective data. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Data on results produced in the present study are available upon reasonable request to the authors. Access to underlying raw data is dependent on conditions of use and will require completion of institutional data use agreement.
更多
查看译文
关键词
learning health system infrastructure,quantified predictive feature discovery,automating toxicity,statistical profiling,data,real-world
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要