BiMediX: Bilingual Medical Mixture of Experts LLM
CoRR(2024)
摘要
In this paper, we introduce BiMediX, the first bilingual medical mixture of
experts LLM designed for seamless interaction in both English and Arabic. Our
model facilitates a wide range of medical interactions in English and Arabic,
including multi-turn chats to inquire about additional details such as patient
symptoms and medical history, multiple-choice question answering, and
open-ended question answering. We propose a semi-automated English-to-Arabic
translation pipeline with human refinement to ensure high-quality translations.
We also introduce a comprehensive evaluation benchmark for Arabic medical LLMs.
Furthermore, we introduce BiMed1.3M, an extensive Arabic-English bilingual
instruction set covering 1.3 Million diverse medical interactions, resulting in
over 632 million healthcare specialized tokens for instruction tuning. Our
BiMed1.3M dataset includes 250k synthesized multi-turn doctor-patient chats and
maintains a 1:2 Arabic-to-English ratio. Our model outperforms state-of-the-art
Med42 and Meditron by average absolute gains of 2.5
computed across multiple medical evaluation benchmarks in English, while
operating at 8-times faster inference. Moreover, our BiMediX outperforms the
generic Arabic-English bilingual LLM, Jais-30B, by average absolute gains of
10
multiple datasets. Our project page with source code and trained model is
available at https://github.com/mbzuai-oryx/BiMediX .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要