Crowdsourcing in the document processing practice

ICWE'10: Proceedings of the 10th international conference on Current trends in web engineering(2010)

引用 8|浏览28
暂无评分
摘要
The processing of scanned documents calls for automatic recognition of the text by OCR (Optical Character Recognition) computer programs, followed by human validation and correction. Crowdsourcing of these essential manual tasks is a good option, provided one can take care of some key challenges, so that the quality level expected by the customer is met. We show how tools for efficient validation and correction are adapted and enhanced to address issues associated with crowdsourcing, such as data privacy, quality control, crowd monitoring, and job quality assurance. We started to implement these ideas and technologies in our COoperative eNgine for Correction of ExtRacted Text (CONCERT), which is used in book digitization projects.
更多
查看译文
关键词
job quality assurance,quality control,quality level,efficient validation,human validation,COoperative eNgine,ExtRacted Text,Optical Character Recognition,automatic recognition,book digitization project,document processing practice
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要