A Document Straight Line Based Segmentation For Complex Layout Extraction

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1(2017)

引用 13|浏览43
暂无评分
摘要
Document layout extraction is a difficult step in the image interpretation process due to the high complexity of documents. The main challenge relies on the huge gap between both the physical and the logical structures of document images. In order to loose as few as possible information, most existing methods are working at pixel level. In this paper, we present a new framework for complex layout extraction based on features of high levels obtained from a document straight line based segmentation. We propose to capture the straight line segments thanks to a new transform integrating the local spatial organization of the segments contained in the document content. Such transform can be applied either on the foreground (related to the document content) or the background pixels, in order to take advantage of the duality of information present in both document parts. Experimental results obtained on the PRImA Layout Analysis dataset illustrate the robustness of our framework for the extraction of specific components of the document including text areas, images and separators.
更多
查看译文
关键词
Document layout extraction,page segmentation,straight line features,Local Diameter Transforms,complex layout,sequential labelling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要