FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
arxiv(2024)
摘要
Modern Large Language Models (LLMs) are capable of following long and complex
instructions that enable a diverse amount of user tasks. However, despite
Information Retrieval (IR) models using LLMs as the backbone of their
architectures, nearly all of them still only take queries as input, with no
instructions. For the handful of recent models that do take instructions, it's
unclear how they use them. We introduce our dataset FollowIR, which contains a
rigorous instruction evaluation benchmark as well as a training set for helping
IR models learn to better follow real-world instructions. FollowIR builds off
the long history of the TREC conferences: as TREC provides human annotators
with instructions (also known as narratives) to determine document relevance,
so should IR models be able to understand and decide relevance based on these
detailed instructions. Our evaluation benchmark starts with three deeply judged
TREC collections and alters the annotator instructions, re-annotating relevant
documents. Through this process, we can measure how well IR models follow
instructions, through a new pairwise evaluation framework. Our results indicate
that existing retrieval models fail to correctly use instructions, using them
for basic keywords and struggling to understand long-form information. However,
we show that it is possible for IR models to learn to follow complex
instructions: our new FollowIR-7B model has significant improvements (over 13
after fine-tuning on our training set.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要