SESAME - Self-supervised framework for Extractive queStion Answering over docuMent collEctions

crossref(2024)

引用 0|浏览0
暂无评分
摘要
Abstract Question Answering is one of the most relevant areas in the field of Natural Language Processing, rapidly evolving with promising results due to the increasing availability of suitable datasets and the advent of new technologies, such as Generative Models. This article introduces SESAME, a Self-supervised framework for Extractive queStion Answering over docuMent collEctions. SESAME aims to enhance open-domain question answering systems (ODQA) by leveraging domain adaptation with synthetic datasets, enabling efficient question answering over private document collections with low resource usage. The framework incorporates recent advances with large language models, and an efficient hybrid method for context retrieval. We conducted several sets of experiments with Machine Reading for Question Answering (MRQA) 2019 Shared Task datasets, FAQuAD - a Brazilian Portuguese reading comprehension dataset, Wikipedia, and Retrieval-Augmented Generation Benchmark, to demonstrate SESAME's effectiveness. The results indicate that SESAME's domain adaptation using synthetic data significantly improves QA performance, generalizes across different domains and languages, and competes with or surpasses state-of-the-art systems in ODQA.Finally, SESAME is an open-source tool, and all code, datasets and experimental data is available for public use in our repository.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要