Printed/Handwritten Texts and Graphics Separation in Complex Documents Using Conditional Random Fields

2018 13th IAPR International Workshop on Document Analysis Systems (DAS)(2018)

引用 8|浏览16
暂无评分
摘要
In this paper we propose a structured prediction based system for text/non-text classification and printed/handwritten texts separation at connected component (CC) level in complex documents. We formulate the separation of different elements as joint classification problems and use conditional random fields (CRFs) to integrate both local and contextual information for improving the classification accuracy. Both our unary and pairwise potentials are formulated as neural networks for better exploiting contextual information. Considering the different properties in text/non-text classification and printed/handwritten texts separation, we use multilayer perception (MLP) and convolutional neural network (CNN) for potentials, respectively. To evaluate the performance of the proposed method, we provide a test paper document database named TestPaper1.0, which can be used for many other tasks as well. Our method achieve impressive results for both tasks on TestPaper1.0 dataset. Moreover, even with very shallow CNNs as potentials, our method achieves state-of-the-art performance for writing type (printed/handwritten) separation on the highly heterogeneous Maurdor dataset, surpassing Maurdor2013 and Maurdor2014 campaign winners. This demonstrates the effectiveness and superiority of our method.
更多
查看译文
关键词
text/non-text,printed/handwritten,document understanding,structured prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要