A Text Line Extraction Method For Archival Document Transcription

PROCEEDINGS OF THE 2020 17TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD 2020)(2020)

引用 2|浏览0
暂无评分
摘要
In order to reinforce the enrichment and exploitation of archival collections, a growing need for computer-aided tools able to assist researchers, historians and archivists in historical document image transcription has been recently highlighted. However, to ensure an efficient text transcription from archival handwritten and printed document images, a robust text line segmentation task is required. Thus, in this paper we propose a method able to extract whole text lines from archival document images. The proposed method is firstly based on our previous work reported at ICDAR 2019, which focused on extracting only the main area covering the text core. A post-processing step is introduced in this paper to extract whole text lines (including the ascender and descender components). The post-processing step is based on topological structural analysis of binary images. To illustrate the effectiveness of the proposed method, we have conducted experiments on archival document images collected from the Tunisian national archives. Qualitative and quantitative results are reported and discussed.
更多
查看译文
关键词
Historical document images, text line segmentation, topological structural analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要