Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

WWW '20: The Web Conference 2020 Taipei Taiwan April, 2020(2020)

引用 25|浏览119
暂无评分
摘要
We tackle the problem of question answering directly on a large document collection, combining simple “bag of words” passage retrieval with a BERT-based reader for extracting answer spans. In the context of this architecture, we present a data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples to supplement existing training data, which are then used together to fine-tune BERT. We explore a number of details that are critical to achieving high accuracy in this setup: the proper sequencing of different datasets during fine-tuning, the balance between “difficult” vs. “easy” examples, and different approaches to gathering negative examples. Experimental results show that, with the appropriate settings, we can achieve large gains in effectiveness on two English and two Chinese QA datasets. We are able to achieve results at or near the state of the art without any modeling advances, which once again affirms the cliché “there’s no data like more data”.
更多
查看译文
关键词
data augmentation, reranking, BERT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要