Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision
CoRR(2024)
摘要
Cross-lingual question answering (CLQA) is a complex problem, comprising
cross-lingual retrieval from a multilingual knowledge base, followed by answer
generation either in English or the query language. Both steps are usually
tackled by separate models, requiring substantial annotated datasets, and
typically auxiliary resources, like machine translation systems to bridge
between languages. In this paper, we show that CLQA can be addressed using a
single encoder-decoder model. To effectively train this model, we propose a
self-supervised method based on exploiting the cross-lingual link structure
within Wikipedia. We demonstrate how linked Wikipedia pages can be used to
synthesise supervisory signals for cross-lingual retrieval, through a form of
cloze query, and generate more natural queries to supervise answer generation.
Together, we show our approach, , outperforms comparable methods
on both supervised and zero-shot language adaptation settings, including those
using machine translation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要