Deep Learning Based Sinhala Optical Character Recognition (OCR)

international conference on advances in ict for emerging regions(2020)

引用 2|浏览0
暂无评分
摘要
With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) technology to recognize printed texts into machine operable text for different character sets. Sinhala as an abugida script has its own writing system which is used to write Sinhala and Pali languages. With the complexities of the Sinhala script, it makes hard to develop an OCR system. When considering recent literature, most research groups try to reduce the complex nature of the Sinhala script with the support of computer science and Neural networks [1] , [2] . Tesseract is an open-source, deep-learning based OCR engine developed by Google [3] . Despite decades of research on the engineering aspects, our attempt was taken to improve the accuracy of Sinhala character recognition using deep learning mechanisms.
更多
查看译文
关键词
Sinhala OCR,Optical Character Recognition,Tesseract,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要