RareBench: Can LLMs Serve as Rare Diseases Specialists?
CoRR(2024)
摘要
Generalist Large Language Models (LLMs), such as GPT-4, have shown
considerable promise in various domains, including medical diagnosis. Rare
diseases, affecting approximately 300 million people worldwide, often have
unsatisfactory clinical diagnosis rates primarily due to a lack of experienced
physicians and the complexity of differentiating among many rare diseases. In
this context, recent news such as "ChatGPT correctly diagnosed a 4-year-old's
rare disease after 17 doctors failed" underscore LLMs' potential, yet
underexplored, role in clinically diagnosing rare diseases. To bridge this
research gap, we introduce RareBench, a pioneering benchmark designed to
systematically evaluate the capabilities of LLMs on 4 critical dimensions
within the realm of rare diseases. Meanwhile, we have compiled the largest
open-source dataset on rare disease patients, establishing a benchmark for
future studies in this domain. To facilitate differential diagnosis of rare
diseases, we develop a dynamic few-shot prompt methodology, leveraging a
comprehensive rare disease knowledge graph synthesized from multiple knowledge
bases, significantly enhancing LLMs' diagnostic performance. Moreover, we
present an exhaustive comparative study of GPT-4's diagnostic capabilities
against those of specialist physicians. Our experimental findings underscore
the promising potential of integrating LLMs into the clinical diagnostic
process for rare diseases. This paves the way for exciting possibilities in
future advancements in this field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要