JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
arxiv(2024)
摘要
Document question answering is a task of question answering on given
documents such as reports, slides, pamphlets, and websites, and it is a truly
demanding task as paper and electronic forms of documents are so common in our
society. This is known as a quite challenging task because it requires not only
text understanding but also understanding of figures and tables, and hence
visual question answering (VQA) methods are often examined in addition to
textual approaches. We introduce Japanese Document Question Answering (JDocQA),
a large-scale document-based QA dataset, essentially requiring both visual and
textual information to answer questions, which comprises 5,504 documents in PDF
format and annotated 11,600 question-and-answer instances in Japanese. Each QA
instance includes references to the document pages and bounding boxes for the
answer clues. We incorporate multiple categories of questions and unanswerable
questions from the document for realistic question-answering applications. We
empirically evaluate the effectiveness of our dataset with text-based large
language models (LLMs) and multimodal models. Incorporating unanswerable
questions in finetuning may contribute to harnessing the so-called
hallucination generation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要