A cascaded approach for page-object detection in scientific papers

Erika Spiteri Bailey,Alexandra Bonnici,Stefania Cristina

Document Engineering(2022)

引用 0|浏览1
暂无评分
摘要
ABSTRACTIn recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要