AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo,Tajuddeen Gwadabe,Clara E. Rivera,Jonathan H. Clark,Sebastian Ruder,David Ifeoluwa Adelani,Bonaventure F. P. Dossou, Abdou Aziz DIOP,Claytone Sikasote,Gilles Hacheme,Happy Buzaaba,Ignatius Ezeani,Rooweither Mabuya,Salomey Osei,Chris Chinenye Emezue, Albert Njoroge Kahira,Shamsuddeen Hassan Muhammad,Akintunde Oladipo, Abraham Toluwase Owodunni,Atnafu Lambebo Tonja,Iyanuoluwa Shode,Akari Asai,Tunde Oluwaseyi Ajayi,Clemencia Siro, Steven Arthur,Mofetoluwa Adeyemi,Orevaoghene Ahia,Anuoluwapo Aremu, Oyinkansola Awosan,Chiamaka Chukwuneke,Bernard Opoku, Adeshina O.S. Ayodele,Verrah A Otiende,Christine Mwase, Boyd Sinkala, Andre Niyongabo Rubungo, Daniel Ajisafe, Emeka Onwuegbuzia, Habib Mbow, Emile Niyomutabazi, Eunice Mukonde, Falalu Ibrahim Lawan,Ibrahim Said Ahmad,Jesujoba Alabi, Martin Namukombo,Mbonu Chinedu, Mofya Phiri, Neo Putini, Ndumiso Mngoma, Priscilla Amuok, Ruqayya Nasir Iro, Sonia Adhiambo

arXiv (Cornell University)(2023)

引用 0|浏览17
暂无评分
摘要
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
更多
查看译文
关键词
african,languages,answering,cross-lingual,open-retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要