SmartATID: A Mobile Captured Arabic Text Images Dataset for Multi-purpose Recognition Tasks

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)(2016)

引用 7|浏览30
暂无评分
摘要
Today's smartphones are able to capture documents with a good and simple way as any personal scanners. The captured document images need to be processed by specific and automated document processing systems. The systems are dedicated to textual content analysis, indexing and recognition. For instance, they may be used for font identification, writer identification and word or line segmentation. The state-of-the-art works lack comprehensive database for Arabic document images which are captured by mobile phones. This paper presents the first public offline images database for both printed and handwriting Arabic mobile captured documents, named "SmartATID". The document images of the database are acquired under varying capture conditions (blur, perspective angles and light). This causes photometric and geometric distortions that influence the performance of OCR process but also the page segmentation in lines and paragraphs. Each document image of our database is provided with a ground truth file that contains the exact text transcription and all numerical capture parameters used for each image capture. The database is freely and publicly usable by the research community at the following address http:// sites.google.com/site/smartatid.
更多
查看译文
关键词
Smartphone Arabic document capture database,mobile OCR,capture-based distortions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要