Development and validation of an artificial intelligence model for predicting post-transplant hepatocellular cancer recurrence.

Quirino Lai, Carmine De Stefano,Jean Emond,Prashant Bhangui,Toru Ikegami,Benedikt Schaefer,Maria Hoppe-Lotichius,Anna Mrzljak,Takashi Ito,Marco Vivarelli,Giuseppe Tisone,Salvatore Agnes,Giuseppe Maria Ettorre,Massimo Rossi,Emmanuel Tsochatzis,Chung Mau Lo,Chao-Long Chen,Umberto Cillo,Matteo Ravaioli,Jan Paul Lerut

Cancer communications (London, England)（2023）

引用 0|浏览10

暂无评分

摘要

Dear Editor, In recent years, criteria based on the combination of morphology and biology have been proposed for improving the selection of hepatocellular cancer (HCC) patients waiting for liver transplantation (LT) [1, 2]. Since all the proposed models showed suboptimal results in predicting the risk of post-LT recurrence, a prediction model constructed using artificial intelligence (AI) could be an attractive way to surpass this limit [3, 4]. Therefore, the Time_Radiological-response_Alpha-fetoproteIN_Artificial-Intelligence (TRAIN-AI) model was developed, combining morphology and biology tumor variables. A Training Set (n = 2,936) derived from an International Cohort was adopted to create the model. A Validation Set (n = 734) derived from the same International Cohort and an external Test Set (n = 356) were identified for internal and external validation of TRAIN-AI, respectively (Supplementary Figure S1). Training and Validation Sets presented similar characteristics (Supplementary Table S1). Conversely, relevant differences were observed when the Test Set was compared with the Validation Set; therefore, external validation of the model was performed in a very different population (i.e., Test Set) from the one from which the TRAIN-AI was derived and internally tested (i.e., Training and Validation Sets) (Supplementary Table S2). Eight variables were significantly associated with the risk of recurrence and used for constructing the TRAIN-AI model: target lesion diameter, nodules number, alpha-fetoprotein, waiting time length, radiological response, model for end-stage liver disease (MELD), living donor liver transplantation, and center volume (Supplementary Table S3). The statistical approaches used for constructing the model are reported in the Supplementary Material. The average impact of each factor on the model output magnitude was explored, with the nodules number and the alpha-fetoprotein (AFP) identified as the most relevant variables (Supplementary Figure S2). Table 1 summarizes the accuracy of the TRAIN-AI model when compared to several currently adopted criteria for predicting post-LT HCC recurrence [1, 5-7]. Brier skill score (%)** Harrell c-statistics (5-year recurrence) (95% CI) The internal validation was performed using the Validation Set data. Time-dependent concordance by Antolini et al. [8] showed that the TRAIN-AI model had the best accuracy (concordance = 0.77; 95% confidence interval [CI] = 0.72-0.82). The TRAIN-AI model consistently outperformed the other criteria (AFP-French model concordance = 0.68; Metroticket 2.0 = 0.68; Milan Criteria [MC] = 0.63) (Table 1). To clarify the magnitude of prediction improvement obtained using the TRAIN-AI score, the Brier score and the Brier skill score were calculated. The TRAIN-AI reported the best value (Brier score = 0.10) among the different criteria. Comparing the TRAIN-AI with each other score, an improvement of the prediction was observed in all the cases: the best progress was reported by comparing the TRAIN-AI score with MC (Brier Skill Score + 14.26%) (Table 1). TRAIN-AI also had the best Harrell c-statistics for the 5-year recurrence risk (concordance = 0.77, 95% CI = 0.71-0.82), being markedly superior to the other criteria (AFP-French model = 0.67, P < 0.001; Metroticket 2.0 = 0.68, P < 0.001; MC = 0.64, P < 0.001) (Table 1). Sub-analyses confirmed the prognostic ability of the TRAIN-AI also in the setting of hepatitis C or Hepatitis B viruses -positivity, LT performed in Asia or Europe, or exceeding the MC status (Supplementary Table S4). Also, in the Test Set data, the TRAIN-AI model had the best concordance (concordance = 0.77; 95% CI = 0.70-0.84). The TRAIN-AI model consistently outperformed the other criteria (Metroticket 2.0 = 0.69; AFP-French model = 0.67; MC = 0.66) (Table 1). The TRAIN-AI Brier score showed the best value (Brier score = 0.10) among the different criteria. Comparing the TRAIN-AI with each other score, an improvement of the prediction was observed in all the cases: the best progress was reported with the MC (Brier Skill Score + 13.94%) (Table 1). The TRAIN-AI c-statistics for the risk of 5-year recurrence was the best observed (concordance = 0.78, 95% CI = 0.71-0.85), being markedly superior to the other criteria (Metroticket 2.0 = 0.69, P = 0.020; MC = 0.67, P = 0.007; AFP-French model = 0.66, P = 0.006) (Table 1). A model user-friendly web calculator was constructed (https://train-ai.cloud) and made available for calculating the expected recurrence after LT in individual patients. After the stratification of the explored populations in three 5-year recurrence risk classes (low: ≤ 15%; intermediate:16%-30%; high: > 30%), the expected vs. observed recurrence rates were compared in the Validation and Test Sets (Supplementary Figure S3). Starting from the assumption that the Hosmer-Lemeshow test indicates a poor calibration if P < 0.050, the test showed a good calibration in the Validation Set (P = 0.540) and in the Test Set (P = 0.380) (Supplementary Figure S3). This is the largest prediction model published in this field based on deep learning algorithms. The performances of TRAIN-AI outperformed several currently used HCC selection criteria both in the internal and external validation. A user-friendly web calculator was also created to calculate each patient's recurrence risk. The proposed model is based only on well-recognized variables readily available worldwide, consenting to reach high standardization rates, completeness, and granularity. Another relevant aspect of this AI model is that it can continuously evolve with further data accumulation. The web calculator allows TRAIN-AI to improve its prognostic performance through continuous data training enlargement. To consent to this improvement, two collaborative international consortia routinely updating their data (i.e., the EurHeCaLT and the East-West LT Study Groups) have been involved in this project. Recently, two studies focused on post-LT HCC recurrence based on AI models [3, 4]. The main disadvantage of these studies was the limited number of patients available for model development and training. Deep learning models typically require thousands of data. This shortcoming is not present in our study, in which 2,936 patients were used for constructing the Training Set. Another relevant problem was the prediction “overfitting” phenomenon, which may generate overly optimistic results [9]. This problem is relevant when training and validation sets derive from the same population. To solve this limit, we externally tested the model using a geographically different population. Training and Validation Sets were composed of Euro-Asiatic patients with short waiting times, one-third of living donation cases, and three-quarter of cases with neo-adjuvant therapies. Conversely, the Test Set was based on North-American patients with long waiting times, fewer cases of living donation, and almost all the cases treated with neo-adjuvant therapies. Despite these differences, the concordance of the TRAIN-AI was always very good (0.77 in both Validation and Test Sets) (Table 1), with a percentage of prediction improvement markedly encompassing all the other criteria. This presented study has some limits. First, it is impossible to understand the outcome operations resulting from deep learning. Secondly, the study is retrospective. Thirdly, some variables were not used for the TRAIN-AI construction, like des-gamma carboxy-prothrombin, inflammatory markers, radiologically detectable macrovascular invasion, and radiomics [10]. The TRAIN-AI model showed higher accuracy than other frequently used scores for the risk of post-LT HCC recurrence. A user-friendly web calculator has been developed to improve the model's availability. A tailored and justified transplantability cutoff can be proposed stratifying the patients in recurrence risk classes. A further prediction implementation of the AI model can be obtained by increasing the number of patients for training. Austria: Andre Viveiros (University of Innsbruck, Innsbruck); Belgium: Samuele Iesari (Université Catholique de Louvain, Brussels), Olga Ciccarelli (UCL, Brussels); Croatia: Branislav Kocman (University of Zagreb, Zagreb); Germany: Jens Mittler (Universit of Mainz, Mainz); Hong Kong: Tiffany Wong (University of Hong Kong, Hong Kong); India: Arvinder Singh Soin (Medanta-The Medicity, Gurgaon); Italy: Federico Mocchegiani (Polytechnic University of Marche, Ancona), Matteo Cescon (University of Bologna, Bologna), Alessandro Vitale (University of Padua, Padua), Gianluca Mennini (Sapienza University, Rome), Tommaso Maria Manzia (PTV University, Rome), Alfonso W. Avolio (Catholic University, Rome), Gabriele Spoletini (Catholic University, Rome), Marco Colasanti (San Camillo Hospital, Rome); Japan: Tomoharu Yoshizumi (Kyushu University, Fukuoka), Toshimi Kaido, Etsurou Hatano (Graduate School of Medicine, Kyoto); Taiwan: Chih Che Lin (Kaohsiung, Taiwan); United Kingdom: Margarita Papatheodoridi (Royal Free Hospital, London), Simona Onali (Royal Free Hospital, London); United States of America: Karim Halazun (Columbia University, New York). Quirino Lai and Carmine De Stefano contributed to the conception and design of the study; Quirino Lai, Prashant Bhangui, Toru Ikegami, Benedikt Schaefer, Maria Hoppe-Lotichius, Anna Mrzljak, Takashi Ito, Marco Vivarelli, Giuseppe Tisone, Salvatore Agnes, Giuseppe Maria Ettorre, Massimo Rossi, Emmanuel Tsochatzis, Chung Mau Lo, Chao-Long Chen, Umberto Cillo, Matteo Ravaioli, and Jan Paul Lerut contributed to acquisition of data; Quirino Lai and Carmine De Stefano analyzed and interpreted the data; Quirino Lai, Carmine De Stefano and Jan Paul Lerut drafted the article; Jean Emond, Toru Ikegami, Benedikt Schaefer, Maria Hoppe-Lotichius, Marco Vivarelli, Emmanuel Tsochatzis, and Matteo Ravaioli critically revised the manuscript; and all authors approved the final version. None. The authors have no conflicts of interest to declare about the present study. The authors have not received any support for the present study, and no specific funding was used for this study. The study was performed according to the Declaration of Helsinki. The study was approved by the Umberto I Policlinico of Rome Institutional Review Board (Approval number: 1000/2018). Not applicable Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要