The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
arxiv(2024)
摘要
Human feedback plays a central role in the alignment of Large Language Models
(LLMs). However, open questions remain about the methods (how), domains
(where), people (who) and objectives (to what end) of human feedback
collection. To navigate these questions, we introduce PRISM, a new dataset
which maps the sociodemographics and stated preferences of 1,500 diverse
participants from 75 countries, to their contextual preferences and
fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM
contributes (i) wide geographic and demographic participation in human feedback
data; (ii) two census-representative samples for understanding collective
welfare (UK and US); and (iii) individualised feedback where every rating is
linked to a detailed participant profile, thus permitting exploration of
personalisation and attribution of sample artefacts. We focus on collecting
conversations that centre subjective and multicultural perspectives on
value-laden and controversial topics, where we expect the most interpersonal
and cross-cultural disagreement. We demonstrate the usefulness of PRISM via
three case studies of dialogue diversity, preference diversity, and welfare
outcomes, showing that it matters which humans set alignment norms. As well as
offering a rich community resource, we advocate for broader participation in AI
development and a more inclusive approach to technology design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要