Post-OCR Error Detection by Generating Plausible Candidates.

ICDAR(2019)

引用 12|浏览9
暂无评分
摘要
The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digital documents are indexed, accessed and exploited. Post-processing approaches detect and correct remaining errors to improve the quality of OCR texts. However, state-of-the-art approaches still need to be improved. Most of the existing post-OCR techniques use predefined error position lists or apply simple techniques to detect errors. In this paper, we describe a novel error detector using different features from character-level (including character noisy channel, index of peculiarity) to word-level (such as frequencies of n-grams, skip-grams, part-of-speech) Experimental results show that our approach outperforms the best performing techniques in the ICDAR 2017 Competition on Post-OCR text correction.
更多
查看译文
关键词
OCR error detection, post-OCR, OCRed text
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要