A Comparative Study of Information Extraction Strategies Using an Attention-Based Neural Network

DOCUMENT ANALYSIS SYSTEMS, DAS 2022(2022)

引用 1|浏览2
暂无评分
摘要
This article focuses on information extraction in historical handwritten marriage records. Traditional approaches rely on a sequential pipeline of two consecutive tasks: handwriting recognition is applied before named entity recognition. More recently, joint approaches that handle both tasks at the same time have been investigated, yielding state-of-the-art results. However, as these approaches have been used in different experimental conditions, they have not been fairly compared yet. In this work, we conduct a comparative study of sequential and joint approaches based on the same attention-based architecture, in order to quantify the gain that can be attributed to the joint learning strategy. We also investigate three new joint learning configurations based on multi-task or multi-scale learning. Our study shows that relying on a joint learning strategy can lead to an 8% increase of the complete recognition score. We also highlight the interest of multi-task learning and demonstrate the benefit of attention-based networks for information extraction. Our work achieves state-of-the-art performance in the ICDAR 2017 Information Extraction competition on the Esposalles database at line-level, without any language modelling or post-processing.
更多
查看译文
关键词
Document image analysis, Historical documents, Information extraction, Handwriting recognition, Named entity recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要