Spotting Words In Handwritten Arabic Documents

DOCUMENT RECOGNITION AND RETRIEVAL XIII(2006)

引用 34|浏览25
暂无评分
摘要
The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher far words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on, the entire set of test word images- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20, 000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five Writers for Providing prototypes and the other five for testing.. using manually segmented documents,. 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.
更多
查看译文
关键词
databases,indexation,associative arrays,word segmentation,interfaces
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要