Desirable Properties for Diversity and Truncated Effectiveness Metrics.

ADCS'18: PROCEEDINGS OF THE 23RD AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM(2018)

引用 12|浏览54
暂无评分
摘要
A wide range of evaluation metrics have been proposed to measure the quality of search results, including in the presence of diversification. Some of these metrics have been adapted for use in search tasks with different complexities, such as where the search system returns lists of different lengths. Given the range of requirements, it can be difficult to compare the behavior of these metrics. In this work, we examine effectiveness metrics using a simple property-based approach. In particular, we present a case-analysis framework to define and study fundamental properties that seem integral to any evaluation metric. An example of a simple property is that a ranking with only one non-relevant document should never score lower than a ranking with two non-relevant documents. The framework facilitates quantifying the ability of metrics to satisfy properties, both separately and simultaneously, and to identify those cases where properties are violated. Our analysis shows that the Average Cube Test and Intent-Aware Average Precision are two metrics which fail to satisfy the desirable properties, and hence should be used with caution.
更多
查看译文
关键词
Evaluation,Search result diversification,Axiomatic analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要