CrowdDA: Difficulty-aware crowdsourcing task optimization for cleaning web tables

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览2
暂无评分
摘要
Web tables are rich sources of structured data for collecting and analyzing data, but there exist various data quality problems such as missing or inconsistent values. Crowdsourcing provides a new solution that leverages the human cognitive ability to clean tables, but existing crowdsourcing-based solutions suffer from cleaning quality. For most crowd are not experts, the difficulty degree of cleaning tasks will seriously affect the cleaning result. To help people clean web tables effectively and efficiently, it is important to reduce the overall difficulty of tasks. In this paper, we introduce a difficulty-aware crowdsourcing task optimization system CrowdDA, which aims to recommend the best task execution order from easy to difficult for crowd and support various kinds of cleaning tasks for web tables. CrowdDA takes both latency and space constraints into account for task optimization and generates the task execution order that minimizes the overall difficulty of tasks under two constraints. Furthermore, CrowdDA adopts partition strategies for large tables to improve system efficiency, and introduces independent task sequence to tolerate crowd's inconsistent answers for system robustness. The experiments based on the real-world datasets demonstrate the performance superiority of CrowdDA for improving the cleaning quality of web tables.
更多
查看译文
关键词
Crowdsourcing,Table cleaning,Task difficulty,Order optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要