Segmentation of Large Historical Manuscript Bundles into Multi-page Deeds.

IbPRIA(2023)

引用 0|浏览1
暂无评分
摘要
Archives around the world have vast uncatalogued series of image bundles of digitized historical manuscripts containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment these bundles into their individual deeds. Even if deeds are page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, the probabilities that each bundle image is an “initial”, “middle” or “final” page of a deed are estimated, and then an optimal sequence of page labels is computed at the whole bundle level. Empirical results are reported which show that this approach achieves almost perfect segmentation of bundles of a massive Spanish series of historical notarial records.
更多
查看译文
关键词
historical manuscript bundles,segmentation,multi-page
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要