AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed and Low Tolerance
CoRR(2024)
摘要
Recent advances in visual anomaly detection research have seen AUROC and
AUPRO scores on public benchmark datasets such as MVTec and VisA converge
towards perfect recall, giving the impression that these benchmarks are
near-solved. However, high AUROC and AUPRO scores do not always reflect
qualitative performance, which limits the validity of these metrics in
real-world applications. We argue that the artificial ceiling imposed by the
lack of an adequate evaluation metric restrains progression of the field, and
it is crucial that we revisit the evaluation metrics used to rate our
algorithms. In response, we introduce Per-IMage Overlap (PIMO), a novel metric
that addresses the shortcomings of AUROC and AUPRO. PIMO retains the
recall-based nature of the existing metrics but introduces two distinctions:
the assignment of curves (and respective area under the curve) is per-image,
and its X-axis relies solely on normal images. Measuring recall per image
simplifies instance score indexing and is more robust to noisy annotations. As
we show, it also accelerates computation and enables the usage of statistical
tests to compare models. By imposing low tolerance for false positives on
normal images, PIMO provides an enhanced model validation procedure and
highlights performance variations across datasets. Our experiments demonstrate
that PIMO offers practical advantages and nuanced performance insights that
redefine anomaly detection benchmarks – notably challenging the perception
that MVTec AD and VisA datasets have been solved by contemporary models.
Available on GitHub: https://github.com/jpcbertoldo/aupimo.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要