Segmentation of offline handwritten Arabic text

2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR)(2017)

引用 6|浏览2
暂无评分
摘要
Arabic script is cursive in both printed and handwritten forms. This intrinsic nature of cursiveness renders the segmentation task challenging. An Arabic word generally consists of multiple parts known as Parts of Arabic Words (PAWs) or simply sub-words. Sub-words share the same vertical space quite frequently which makes vertical projection segmentation technique inefficient. Several Arabic letters have annexed parts (diacritics) which are located above or below the main parts of the character. The relative positions of the annexed parts and main parts vary a lot in handwritten text. In this paper the task of segmenting offline handwritten Arabic text up to character level is taken up. Firstly, graph-theoretic modeling is utilized to extract connected components of word image. These components are subjected to a thorough analysis to facilitate the segmentation of input image into sub-words. In the sequel diacritics are removed. Then, large number of candidate segmentation points is identified based on two strategies that utilize stroke thickness as a heuristic. Final segmentation points are obtained using a set of rules on the candidate segmentation points. Finally, each sub-word is segmented and diacritics are brought back to their respective segments taking into account the issue of diacritics displacement. Experimentation is conducted on a set of handwritten images of Arabic text drawn from IFN/ENIT dataset. The results obtained are encouraging.
更多
查看译文
关键词
Arabic sub-word segmentation,Overlapping Arabic sub-words,Arabic Character Segmentation,Handwritten Arabic text recoginition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要