Feature Selection for Document Flow Segmentation

2018 13th IAPR International Workshop on Document Analysis Systems (DAS)(2018)

引用 8|浏览23
暂无评分
摘要
In this paper, we describe a method to restore a flow of continuous documents. The flow is a collection of consecutive scanned pages without explicit separation marks between documents. Our method is based on contextual and layout descriptors meant to specify the relationship between each pair of consecutive pages. The relationships are represented using vectors of features with boolean values indicating the presence or the absence of descriptors on concerned pages. The segmentation task therefore consists in classifying such vectors into continuities or breaks. The continuity class indicates that pages belong to the same document while the break class ends the ongoing document and starts a new one. The experimental part is based on a large collection of real administrative documents.
更多
查看译文
关键词
document flow,segmentation,classification,descriptors,continuity,break
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要