LayoutQT—Layout Quadrant Tags to embed visual features for document analysis

Patricia Medyna Lauritzen de Lucena Drumond,Lindeberg Pessoa Leite,Teofilo E. de Campos,Fabricio Ataides Braz

Engineering Applications of Artificial Intelligence(2023)

引用 0|浏览15
暂无评分
摘要
The relative position of text blocks plays a crucial role in document understanding. However, the task of embedding layout information in the representation of a page instance is not trivial. Computer Vision and Natural Language Processing techniques have been advancing in extracting content from document images considering layout features. We propose a set of Layout Quadrant Tags (LayoutQT) as a new way of encoding layout information in textual embedding. We show that this enables a standard NLP pipeline to be significantly enhanced without requiring expensive mid or high-level multimodal fusion. Given that our focus is on developing a low computational cost solution, we focused our experiments on the AWD-LSTM neural network. We evaluated our method for page stream segmentation and document classification tasks on two datasets, Tobacco800 and RVL-CDIP. In the former, our method improved the F1 score from 97.9% to 99.1% and in the latter the F1 score went from 80.4% to 83.6%. Similar levels of performance improvement were also obtained when we applied LayoutQT with BERT.
更多
查看译文
关键词
layoutqt—layout quadrant tags,document analysis,visual features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要