A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records

IJDAR(2018)

引用 3|浏览29
暂无评分
摘要
This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.
更多
查看译文
关键词
Historical documents,Field localization,Heterogeneous layout,Rule-based system,Word spotting,Unsupervised clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要