Textual-Content-Based Classification Of Bundles Of Untranscribed Manuscript Images

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)(2020)

引用 9|浏览1
暂无评分
摘要
Content-based classification of manuscripts is an important task that is generally performed in archives and libraries by experts with a wealth of knowledge on the manuscript's contents. Unfortunately, many manuscript collections are so vast that it is not feasible to rely solely on experts to perform this task. Current approaches for textual-content-based manuscript classification generally require the handwritten images to be first transcribed into text - but achieving sufficiently accurate transcripts are generally unfeasible for large sets of historical manuscripts. We propose a new approach to perform automatically this classification task which does not rely on any explicit i mage transcripts. It is based on "probabilistic indexing", a relatively novel technology which allows to effectively represent the intrinsic word-level uncertainty generally exhibited by handwritten text images. We assess the performance of this approach on a large collection of complex manuscripts from the Spanish Archivo General de Indias, with promising results. To the best of our knowledge, this is the first published work proposing, developing and assessing a successful approach for content-based classification of untranscribed manuscript images.
更多
查看译文
关键词
archives,libraries,manuscript collections,textual-content-based manuscript classification,handwritten images,sufficiently accurate transcripts,historical manuscripts,classification task,explicit image transcripts,handwritten text images,complex manuscripts,Spanish Archivo General de Indias,untranscribed manuscript images,textual-content-based classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要