Extracting information from handwritten content in census forms

ICPR(2012)

引用 23|浏览29
暂无评分
摘要
In this paper, we describe our approach for extracting salient information from US census form images. These forms present several challenges including variations in individual form templates, skew, writing device, writing style, etc. We describe an innovative registration algorithm that is robust to scale variations for segmenting the input image into cells. Following registration, the borders of cells are removed using a shape-based rule-line removal algorithm to extract handwritten content from each cell. Finally, the individual cell images are recognized using a hidden Markov model (HMM) OCR system with language models biased for the type of information in the cell, such as person name, place name, numbers, marital status, gender, race, etc.
更多
查看译文
关键词
person name,skew,marital status,us census form images,language models,input image segmentation,writing style,image segmentation,hmm,individual cell images,numbers,shape-based rule-line removal algorithm,race,salient information extraction,optical character recognition,ocr system,image registration,writing device,hidden markov models,individual form templates,place name,innovative registration algorithm,handwritten content extraction,hidden markov model,gender
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要