Novel Handwritten Words and Documents Databases of Five Middle Eastern Languages

ICFHR(2014)

引用 4|浏览8
暂无评分
摘要
This paper introduces new handwritten databases of selected words in the five Middle-Eastern languages of Arabic, Dari, Farsi, Pashto and Urdu. The databases share a common lexicon of forty words that are related to finance and are used in daily life. The five databases have been collected from over 1600 native writers located in four countries. Recognition results for each of the databases are also presented. Results come from three classifiers (Support Vector Machines, Modified Quadratic Discriminant Function. And Multi-layer Perceptron) which were implemented for recognition of the words based on gradient features. Given the diversity of the data, the results demonstrate the effectiveness of the implemented process in learning and recognizing samples of handwritten words from different languages. In addition, full page handwritten documents of each language are presented, with approximately forty pages per language. Each document has associated ground truth information.
更多
查看译文
关键词
database,database management systems,farsi,recognition,dari,documents databases,middle eastern languages,arabic,novel handwritten words databases,handwritten documents,isolated words,pashto,natural language processing,urdu,handwriting recognition,word spotting,gradient features,line extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要