Quality Management In Crowdsourcing Using Gold Judges Behavior

Gabriella Kaza,Imed Zitouni

WSDM(2016)

引用 55|浏览61
暂无评分
摘要
Crowdsourcing relevance labels has become an accepted practice for the evaluation of IR systems, where the task of constructing a test collection is distributed over large populations of unknown users with widely varied skills and motivations. Typical methods to check and ensure the quality of the crowd's output is to inject work tasks with known answers (gold tasks) on which workers' performance can be measured. However, gold tasks are expensive to create and have limited application. A more recent trend is to monitor the workers' interactions during a task and estimate their work quality based on their behavior. In this paper, we show that without gold behavior signals that reflect trusted interaction patterns, classifiers can perform poorly, especially for complex tasks, which can lead to high quality crowd workers getting blocked while poorly performing workers remain undetected. Through a series of crowdsourcing experiments, we compare the behaviors of trained professional judges and crowd workers and then use the trained judges' behavior signals as gold behavior to train a classifier to detect poorly performing crowd workers. Our experiments show that classification accuracy almost doubles in some tasks with the use of gold behavior data.
更多
查看译文
关键词
Experimentation,Measurement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要