Complete OCR Solution for Image Analysis of World War 2 Documents

2022 21st RoEduNet Conference: Networking in Education and Research (RoEduNet)(2022)

引用 0|浏览9
暂无评分
摘要
The field of Optical Character Recognition (OCR) consists of techniques that are mainly focused on document image analysis. Aside from generating significant speedups of everyday procedures, OCR has a considerable role in the preservation of historical sources of information. Most of the World War 2 (WW2) documents are of great importance, especially with applications in virtual archives, museums, and research. The situation asks for an efficient, yet not aggressive, transcribing method using OCR tools. This paper describes an approach in the context of the given problem. The focus is oriented towards extracting the information from documents affected by their age, but with simpler structures, mainly split into paragraphs, such as letters and military reports. The approach is based on combining the results of multiple OCR engines, with the final objective of achieving better performance compared to the individual performance of each engine.
更多
查看译文
关键词
OCR,WW2 Documents,Multiple-Engine OCR,voting-based OCR,Abbyy FineReader,Tesseract OCR,Ocropus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要