SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
arxiv(2024)
摘要
Recently, interpreting complex charts with logical reasoning have emerged as
challenges due to the development of vision-language models. A prior
state-of-the-art (SOTA) model, Deplot, has presented an end-to-end method that
leverages the vision-language model to convert charts into table format
utilizing Large Language Models (LLMs) for reasoning. However, unlike natural
images, charts contain a mix of essential and irrelevant information required
for chart reasoning, and we discover that this characteristic can lower the
performance of chart-to-table extraction. In this paper, we introduce SIMPLOT,
a method designed to extract only the elements necessary for chart reasoning.
The proposed method involves two steps: 1) training to mimic a simple plot that
contains only the essential information from a complex chart for table
extraction, followed by 2) performing reasoning based on the table. Our model
enables accurate chart reasoning without the need for additional annotations or
datasets, and its effectiveness is demonstrated through various experiments.
Furthermore, we propose a novel prompt addressing the shortcoming of recent
SOTA model, ignoring visual attributes such as color. Our source code is
available at https://github.com/sangwu99/Simplot.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要