Improving optical character recognition performance for low quality images

Matteo Brisinello,Ratko Grbić, Matija Pul, Tihomir Andelic

2017 International Symposium ELMAR(2017)

引用 20|浏览3
暂无评分
摘要
Efficient Optical Character Recognition (OCR) in images grabbed from Set-Top Boxes (STBs) plays an important role in STB testing. However, running OCR software on such images usually ends with low OCR performance since images can have low resolution, low image quality or colorful background. In order to improve OCR performance, four different image preprocessing methods are proposed. In this paper OCR is performed with Tesseract 3.5 and the relatively new Tesseract 4.0 on the images grabbed from different STBs. On the original images Tesseract 3.5 provides a 35.7% accuracy while Tesseract 4.0 attains a 70.2% accuracy. The proposed preprocessing methods improve OCR performance by 33.3% for Tesseract 3.5 and 22.6% for Tesseract 4.0 on the available images.
更多
查看译文
关键词
OCR,Tesseract,Low Quality Images,Image Preprocessing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要