Extract Data Points from Invoices with Multi-layer Graph Attention Network and Named Entity Recognition

2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)(2022)

引用 0|浏览3
暂无评分
摘要
Extracting key information and data points from business documents, such as invoices and bank statements, is highly significant to office automation especially in accounting and financial areas. Compared with general documents, invoices are highly structured and usually with complex layouts like tables and text boxes, which makes typical Natural Language Processing models less effective when processing the plain texts that are directly converted from the original invoices. To address this issue, many approaches adopted Graph-based Networks to propagate node embeddings along with revealing latent connections between tokens or text fragments. In this work, we focused on the task of extracting data points from invoices as a sub-task of Information Extraction (IE), where the data point extraction is modelled as a node-level classification approach. We also proposed the Stacked Propagation Network to facilitate the propagation of node embeddings according to different edge maps, based on Graph Attention Network (GAT). In addition, Named Entity Recognition (NER) is adopted to improve the performance of data point extraction. Extensive experiments have been performed to show the effectiveness of the proposed approach, on a real-world dataset. Ablation studies are also conducted to evaluate the influences brought by each component.
更多
查看译文
关键词
Information Extraction,Document Analysis,Data Points,Graph Attention Network,Named Entity Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要