Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia
CoRR(2024)
摘要
AI tools are increasingly deployed in community contexts. However, datasets
used to evaluate AI are typically created by developers and annotators outside
a given community, which can yield misleading conclusions about AI performance.
How might we empower communities to drive the intentional design and curation
of evaluation datasets for AI that impacts them? We investigate this question
on Wikipedia, an online community with multiple AI-based content moderation
tools deployed. We introduce Wikibench, a system that enables communities to
collaboratively curate AI evaluation datasets, while navigating ambiguities and
differences in perspective through discussion. A field study on Wikipedia shows
that datasets curated using Wikibench can effectively capture community
consensus, disagreement, and uncertainty. Furthermore, study participants used
Wikibench to shape the overall data curation process, including refining label
definitions, determining data inclusion criteria, and authoring data
statements. Based on our findings, we propose future directions for systems
that support community-driven data curation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要