Segmenting large historical notarial manuscripts into multi-page deeds

Pattern Analysis and Applications(2024)

引用 0|浏览0
暂无评分
摘要
Archives around the world hold vast digitized series of historical manuscript books or “bundles” containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment them into their individual deeds. Even if deeds are often page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, we estimate the class-posterior at the page level for the “initial”, “middle”, and “final” classes; then we “decode” these posteriors applying a series of sequentiality consistency constraints to obtain a consistent book segmentation. Experiments are presented for four large historical manuscripts, varying the number of “deeds” used for training. Two metrics are introduced to assess the quality of book segmentation, one of them taking into account the loss of information entailed by segmentation errors. The problem formalization, the metrics and the empirical work significantly extend our previous works on this topic.
更多
查看译文
关键词
Handwritten document image processing,Multi-page layout analysis,Bundle segmentation,Historical manuscripts
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要