Common Problems With the Usage of F-Measure and Accuracy Metrics in Medical Research

IEEE Access(2023)

引用 0|浏览2
暂无评分
摘要
Problem Binary classifiers are widely used in medical research, especially for diagnoses. They are usually evaluated via performance metrics computed based on confusion matrices. Accuracy and F-measure are among the most frequently used performance metrics, but they make implicit assumptions and do not take into account important characteristics of classifiers. As a consequence, evaluations based on Accuracy or F-measure may turn out to be incorrect, unreliable, and inadequate for the specific application context. The usage of Accuracy and F-measure is particularly critical in the medical domain, where selecting a sub-optimal classifier may lead to incorrect diagnoses, with potentially serious or even fatal consequences. Aim We investigated whether the improper or naive usage of Accuracy and F-measure can lead to partial or incorrect evaluations. If this is the case, we need a procedure to reinterpret the conclusions reported in research articles, whenever possible. Method After discussing a few important properties of Accuracy and F-measure, we examine a set of representative research articles, to assess their conclusions, and illustrate a procedure to reinterpret those conclusions. Results It appears that the examined research articles yield conclusions that are largely affected by the used performance metrics, which in some cases lead to very misleading conclusions. The application of the proposed procedure allows the retrieval of confusion matrices and the derivation of reliable indications of classifiers' performances. Conclusion F-measure and Accuracy should be used with care, being aware of their characteristics and limits. We recommend that future evaluations of binary classifiers be provided with the complete confusion matrices, so that users can formulate evaluations based on specific contexts and priorities.
更多
查看译文
关键词
Measurement,Frequency modulation,Medical diagnostic imaging,Information retrieval,Estimation,Web search,Sensitivity,Accuracy,binary classifiers,F-measure,F-score,performance metrics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要