A Review of the F-Measure: Its History, Properties, Criticism, and Alternatives

Peter Christen,David J. Hand, Nishadi Kirielle

ACM COMPUTING SURVEYS(2024)

引用 0|浏览1
暂无评分
摘要
Methods to classify objects into two or more classes are at the core of various disciplines. When a set of objects with their true classes is available, a supervised classifier can be trained and employed to decide if, for example, a new patient has cancer or not. The choice of performance measure is critical in deciding which supervised method to use in any particular classification problem. Differentmeasures can lead to very different choices, so the measure should match the objectives. Many performance measures have been developed, and one of them is the F-measure, the harmonic mean of precision and recall. Originally proposed in information retrieval, the F-measure has gained increasing interest in the context of classification. However, the rationale underlying this measure appears weak, and unlike other measures, it does not have a representational meaning. The use of the harmonic mean also has little theoretical justification. The F-measure also stresses one class, which seems inappropriate for general classification problems. We provide a history of the F-measure and its use in computational disciplines, describe its properties, and discuss criticism about the F-Measure. We conclude with alternatives to the F-measure, and recommendations of how to use it effectively.
更多
查看译文
关键词
Supervised classification,performance assessment,F1-score,F1-measure,F*-measure,representational measure,pragmatic measure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要