High Recall, Small Data: The Challenges of Within-System Evaluation in a Live Legal Search System
arxiv(2024)
摘要
This paper illustrates some challenges of common ranking evaluation methods
for legal information retrieval (IR). We show these challenges with log data
from a live legal search system and two user studies. We provide an overview of
aspects of legal IR, and the implications of these aspects for the expected
challenges of common evaluation methods: test collections based on explicit and
implicit feedback, user surveys, and A/B testing. Next, we illustrate the
challenges of common evaluation methods using data from a live, commercial,
legal search engine. We specifically focus on methods for monitoring the
effectiveness of (continuous) changes to document ranking by a single IR system
over time. We show how the combination of characteristics in legal IR systems
and limited user data can lead to challenges that cause the common evaluation
methods discussed to be sub-optimal. In our future work we will therefore focus
on less common evaluation methods, such as cost-based evaluation models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要