Preference-Based Offline Evaluation.

WSDM(2023)

引用 2|浏览31
暂无评分
摘要
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要