UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents
CoRR(2024)
摘要
Existing methods for Visual Information Extraction (VIE) from form-like
documents typically fragment the process into separate subtasks, such as key
information extraction, key-value pair extraction, and choice group extraction.
However, these approaches often overlook the hierarchical structure of form
documents, including hierarchical key-value pairs and hierarchical choice
groups. To address these limitations, we present a new perspective, reframing
VIE as a relation prediction problem and unifying labels of different tasks
into a single label space. This unified approach allows for the definition of
various relation types and effectively tackles hierarchical relationships in
form-like documents. In line with this perspective, we present UniVIE, a
unified model that addresses the VIE problem comprehensively. UniVIE functions
using a coarse-to-fine strategy. It initially generates tree proposals through
a tree proposal network, which are subsequently refined into hierarchical trees
by a relation decoder module. To enhance the relation prediction capabilities
of UniVIE, we incorporate two novel tree constraints into the relation decoder:
a tree attention mask and a tree level embedding. Extensive experimental
evaluations on both our in-house dataset HierForms and a publicly available
dataset SIBR, substantiate that our method achieves state-of-the-art results,
underscoring the effectiveness and potential of our unified approach in
advancing the field of VIE.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要