Segmentation And Labeling Of Documents Using Conditional Random Fields

DOCUMENT RECOGNITION AND RETRIEVAL XIV(2007)

引用 50|浏览28
暂无评分
摘要
The paper describes the use of Conditional Random Fields(CRF) utilizing contextual information in automatically labeling extracted segments of scanned documents as Machine-print, Handwriting and Noise. The result of such a labeling can serve as an indexing step for a context-based image retrieval system or a bio-metric signature verification system. A simple region growing algorithm is first used to segment the document into a number of patches. A label for each such segmented patch is inferred using a CRF model. The model is flexible enough to include signatures as a type of handwriting and isolate it from machine-print and noise. The robustness of the model is due to the inherent nature of modeling neighboring spatial dependencies in the labels as well as the observed data using CRF. Maximum pseudo-likelihood estimates for the parameters of the CRF model are learnt using conjugate gradient descent. Inference of labels is done by computing the probability of the labels under the model with Gibbs sampling. Experimental results show that this approach provides for 95.75 % of the data being assigned correct labels. The CRF based model is shown to be superior to Neural Networks and Naive Bayes.
更多
查看译文
关键词
Conditional Random Field(CRF),labeling scanned documents,handwritten text extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要