Commentary: If a patient's life is at stake, let's not stop at the surface of the curves.

The Journal of thoracic and cardiovascular surgery(2023)

引用 2|浏览7
暂无评分
摘要
Central MessageImbalanced data could lead to an overoptimistic performance of ROC curve analysis. The use of precision-recall curves and calibration tests are recommended.See Article page 1433. Imbalanced data could lead to an overoptimistic performance of ROC curve analysis. The use of precision-recall curves and calibration tests are recommended. See Article page 1433. The Journal does not only publish original research articles, which are obviously of an excellent level, but it also provides articles concerning the statistical method and the practical implications. The article by Movahedi and colleagues1Movahedi F. Padman R. Antaki J.F. Limitations of receiver operating characteristic curve on imbalanced data: assist device mortality risk scores.J Thorac Cardiovasc Surg. 2023; 165: 1433-1442.e2Abstract Full Text Full Text PDF PubMed Scopus (6) Google Scholar is just 1 of them. The authors detail how a receiver operating characteristic (ROC) curve works and what erroneous messages it can deliver under certain conditions—which (unfortunately or may be not) occur very often in medicine. In fact, the problem focused on by this article is a very common phenomenon called the imbalance of the data. What every score does (and for the same reason is defined as a classifier) is to assign labels (eg, dead or alive) to patients. Consequently, every score assigns patients to 4 categories: the true dead, the false (labeled) dead, the true surviving, and the false (labeled) surviving. If the numbers of patients in these categories is unequal, there we have the unbalancing of data and the ROC curve could lead to an overly optimistic interpretation of the score's performance. The authors offer as an example 2 mortality scores used to predict the outcome after left ventricular assist device implantation (namely the HeartMarte Risk Score2Adamo L. Nassif M. Tibrewala A. Novak E. Vader J. Silvestry S.C. et al.The Heartmate Risk Score predicts morbidity and mortality in unselected left ventricular assist device recipients and risk stratifies INTERMACS class 1 patients.JACC Heart Fail. 2015; 4: 283-290Crossref Scopus (24) Google Scholar and the Random Forest3Smedira N.G. Blackstone E.H. Ehrlinger J. Thuita L. Pierce C.D. Moazami N. et al.Current risks of HeartMate II pump thrombosis: non-parametric analysis of Interagency Registry for Mechanically Assisted Circulatory Support data.J Heart Lung Transplant. 2015; 34: 1527-1534Abstract Full Text Full Text PDF PubMed Scopus (52) Google Scholar), but indeed their observations can be applied to any other scores. Movahedi and colleagues1Movahedi F. Padman R. Antaki J.F. Limitations of receiver operating characteristic curve on imbalanced data: assist device mortality risk scores.J Thorac Cardiovasc Surg. 2023; 165: 1433-1442.e2Abstract Full Text Full Text PDF PubMed Scopus (6) Google Scholar discuss the problem and suggest—as a solution—the use of a supplemental evaluation tool: The precision-recall curve.4Saito T. Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.PLoS One. 2015; 10: e0118432Crossref PubMed Scopus (1824) Google Scholar The ROC curve is among the most-used tools in medical statistics, and for this reason, it risks being abused. Like other things in life, ROC analysis was invented for other purposes (it was developed during World War II to decide whether a radar signal was an enemy airplane or not, and eventually whether or not to shoot it down5Ferraris V.A. Commentary: should we rely on receiver operating characteristic curves? From submarines to medical tests, the answer is a definite maybe!.J Thorac Cardiovasc Surg. 2019; 157: 2354-2355Abstract Full Text Full Text PDF PubMed Scopus (8) Google Scholar) and is now commonly applied by the medical–scientific community to identify what probability a patient has to survive an intervention. Given the ethical implications (of the intervention sure, but also of shooting down an airplane) it is necessary to understand with great care the limits of this technique and to consider more parameters before taking action (eg, the precision-recall curve, but without forgetting calibration tests such as the Brier score).6Murphy A.H. A new vector of partition of the probability score.J Appl Meteorol. 1973; 12: 595-600Crossref Google Scholar The Roman poet Carlo Alberto Camillo Mariano Salustri (1871-1950), also known as Trilussa, effectively summarized in his satirical verses the limits of statistics, especially in the case of imbalanced data: “According to statistic, everyone has a chicken. And if you do not have any, that means some other owns 2.” Obviously if you are the person without a chicken and you are hungry, it is not of statistical significance. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scoresThe Journal of Thoracic and Cardiovascular SurgeryVol. 165Issue 4PreviewIn the left ventricular assist device domain, the receiver operating characteristic is a commonly applied metric of performance of classifiers. However, the receiver operating characteristic can provide a distorted view of classifiers’ ability to predict short-term mortality due to the overwhelmingly greater proportion of patients who survive, that is, imbalanced data. This study illustrates the ambiguity of the receiver operating characteristic in evaluating 2 classifiers of 90-day left ventricular assist device mortality and introduces the precision recall curve as a supplemental metric that is more representative of left ventricular assist device classifiers in predicting the minority class. Full-Text PDF
更多
查看译文
关键词
patient,surface
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要