Setting up a competition framework for the evaluation of structure extraction from OCR-ed books

Antoine Doucet,Gabriella Kazai, Bodin Dresevic, Aleksandar Uzelac,Bogdan Radakovic,Nikola Todic

International Journal on Document Analysis and Recognition (IJDAR)(2010)

引用 21|浏览0
暂无评分
摘要
This paper describes the setup of the Book Structure Extraction competition run at ICDAR 2009. The goal of the competition was to evaluate and compare automatic techniques for deriving structure information from digitized books, which could then be used to aid navigation inside the books. More specifically, the task that participants faced was to construct hyperlinked tables of contents for a collection of 1,000 digitized books. This paper describes the setup of the competition and its challenges. It introduces and discusses the book collection used in the task, the collaborative construction of the ground truth, the evaluation measures, and the evaluation results. The paper also introduces a data set to be used freely for research evaluation purposes.
更多
查看译文
关键词
Ground Truth,Optical Character Recognition,Portable Document Format,Document Type Definition,Structure Extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要