Arxiv Tables: Document Understanding Challenge Linking Texts and Tables.

Karolina Konopka, Michal Turski,Filip Gralinski

ICDAR Workshops (2)(2023)

引用 0|浏览3
暂无评分
摘要
We introduce Arxiv Tables, a novel challenge for Document Understanding focused on tables, but in relation to text passages. In order to build the data set, we leverage arXiv, a large open-access archive of scholarly papers. We use both [inline-graphic not available: see fulltext] source codes and graphical renderings of papers and combine tables with their references in the main text to create a quasi-Question Answering dataset by masking selected fragments available in the table. What distinguishes the dataset is that (1) the domain is science, (2) the input texts are longer than in typical Document Understanding Question Answering tasks, and (3) both the input and output contain non-standard characters used in scientific notation. For easier comparison for future research using this dataset, strong baselines are also given.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要