Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: An application for type 2 diabetes precision medicine

BMC Medical Informatics Decis. Mak.(2022)

引用 1|浏览7
暂无评分
摘要
Objective To compare individualized treatment selection strategies based on predicted individual-level treatment effects from a causal forest machine learning algorithm and a penalized regression model. Study Design and Setting Cohort study characterizing individual-level glucose-lowering response (6 month reduction in HbA1c) in people with type 2 diabetes initiating SGLT2-inhibitor or DPP4-inhibitor therapy. Model development set comprised 1,428 participants in the CANTATA-D and CANTATA-D2 trials (SGLT2-inhibitor versus DPP4-inhibitor). For external validation, calibration of observed versus predicted differences in HbA1c in patient strata defined by size of predicted HbA1c benefit was evaluated in 18,741 UK primary care patients (Clinical Practice Research Datalink). Results Heterogeneity in treatment effects was detected in trial participants with both approaches (causal forest: 98.6% & penalized regression: 81.7% predicted to have a benefit on SGLT2-inhibitor therapy over DPP4-inhibitor therapy). In validation, calibration was good with penalized regression but sub-optimal with causal forest. A strata with an HbA1c benefit >10 mmol/mol with SGLT2-inhibitors (3.7% of patients, observed benefit 11.0 mmol/mol [95%CI 8.0-14.0]) was identified using penalized regression but not causal forest, and a much larger strata with an HbA1c benefit 5-10 mmol with SGLT2-inhibitors was identified with penalized regression (regression: 20.9% of patients, observed benefit 7.8 mmol/mol (95%CI 6.7-8.9); causal forest 11.6%, observed benefit 8.7 mmol/mol (95%CI 7.4-10.1). Conclusion When evaluating treatment effect heterogeneity researchers should not rely on causal forest (or other similar machine learning algorithms) alone, and must compare outputs with standard regression. Question What is the comparative utility of machine learning compared to standard regression for identifying variation in patient-level outcomes (treatment effect heterogeneity) due to different treatments? Findings Causal forest and penalized regression models were developed using trial data to predict glycated hemoglobin [HbA1c]) outcomes with SGLT2-inhibitor and DPP4-inhibitor therapy in 1,428 individuals with type 2 diabetes. In external validation (18,741 patients), penalized regression outperformed causal forest at identifying population strata with a superior glycemic response to SGLT2-inhibitors compared to DPP4-inhibitors. Implications Studies estimating treatment effect heterogeneity should not solely rely on machine learning and should compare results with standard regression. ### Competing Interest Statement BAM is an employee of the Wellcome Trust and holds an honorary post at University College London for the purposes of carrying out independent research; the views expressed in this manuscript do not necessarily reflect the views of the Wellcome Trust. SJV declares funding from IQVIA. All other authors declare no competing interests. ### Funding Statement This research was supported by a BHF-Turing Cardiovascular Data Science Award (SP/19/6/34809), and the Medical Research Council (UK) (MR/N00633X/1). ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Approval and data access for the study was granted by the CPRD Independent Scientific Advisory Committee (ISAC 13_177R), and the YODA Project (#2017-1816). I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes No additional data are available from the authors although CPRD data are available by application to CPRD Independent Scientific Advisory Committee, and the clinical trial data are accessible via application from the Yale University Open Data Access Project.
更多
查看译文
关键词
treatment effect heterogeneity,causal forest,diabetes precision medicine,regression-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要