Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
arxiv(2024)
摘要
The integration of Large Language Models (LLMs) in information retrieval has
raised a critical reevaluation of fairness in the text-ranking models. LLMs,
such as GPT models and Llama2, have shown effectiveness in natural language
understanding tasks, and prior works (e.g., RankGPT) have also demonstrated
that the LLMs exhibit better performance than the traditional ranking models in
the ranking task. However, their fairness remains largely unexplored. This
paper presents an empirical study evaluating these LLMs using the TREC Fair
Ranking dataset, focusing on the representation of binary protected attributes
such as gender and geographic location, which are historically underrepresented
in search outcomes. Our analysis delves into how these LLMs handle queries and
documents related to these attributes, aiming to uncover biases in their
ranking algorithms. We assess fairness from both user and content perspectives,
contributing an empirical benchmark for evaluating LLMs as the fair ranker.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要