Inconsistency of evaluation metrics in link prediction
CoRR(2024)
摘要
Link prediction is a paradigmatic and challenging problem in network science,
which predicts missing links, future links and temporal links based on known
topology. Along with the increasing number of link prediction algorithms, a
critical yet previously ignored risk is that the evaluation metrics for
algorithm performance are usually chosen at will. This paper implements
extensive experiments on hundreds of real networks and 25 well-known
algorithms, revealing statistically significant inconsistency of evaluation
metrics, namely different metrics probably produce remarkably different
rankings of algorithms. Therefore, we conclude that any single metric cannot
comprehensively or credibly evaluate algorithm performance. Further analysis
suggests the usage of at least two metrics: one is the area under the receiver
operating characteristic curve (AUC), and the other is one of the following
three candidates metrics, say the area under the precision-recall curve (AUPR),
the area under the precision curve (AUC-Precision), and the normalized
discounted cumulative gain (NDCG). In addition, as we have proved the essential
equivalence of threshold-dependent metrics, if in a link prediction task, some
specific thresholds are meaningful, we can consider any one threshold-dependent
metric with those thresholds. This work completes a missing part in the
landscape of link prediction, and provides a starting point toward a
well-accepted criterion or standard to select proper evaluation metrics for
link prediction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要