Alignment Relation is What You Need for Diagram Parsing.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society(2024)

引用 0|浏览0
暂无评分
摘要
As a knowledge carrier, the diagram is widely distributed in many aspects of human life, such as textbooks, architectural drawings, and documents. Different from natural images, representations of visual elements in the diagram are sparser, and similar visual representations can reflect dissimilar semantics. Thus, current methods fail to capture the visual elements with precise semantics. To address this issue, regarding the aligned visual and textual elements as pairs is the way to assign the precise semantics of textual elements to visual elements. We build the first diagram dataset named align diagram element (ADE), which includes annotations for alignment relations between visual and textual elements. And we propose a visual-textual alignment model (VTAM) including graph construction and optimal aligning phases. In the graph construction phase, the relational graphs are constructed between different elements with four relational operators. The relational operators are designed to measure the relations between different elements, according to distance, connection line, inclusion, and feature similarity. In the optimal aligning phase, the representation at each visual-textual pair is improved as a weighted sum of the representations on all relational graphs. Experimental results show that our VTAM achieves a significant improvement of 10.9% on mean test folds of the ADE dataset than the current best competitor. In order to explore the role of alignment relations in diagram parsing, we introduce VTAM to diagram-related tasks, such as diagram question answering (DQA). And we achieve 2.8% to 5.9% and 4.6% to 5.1% improvements on AI2D and Foodwebs after adding VTAM. Our dataset and code are released at: https://github.com/ADE-dataset/ADE-dataset.
更多
查看译文
关键词
Diagram Parsing,Alignment Relation,Relational Operator,DQA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要