Benchmarking the symptom-checking capabilities of ChatGPT for a broad range of diseases

Anjun Chen, Drake O. Chen,Lu Tian

Journal of the American Medical Informatics Association : JAMIA(2023)

引用 0|浏览1
暂无评分
摘要
Objective This study evaluates ChatGPT's symptom-checking accuracy across a broad range of diseases using the Mayo Clinic Symptom Checker patient service as a benchmark.Methods We prompted ChatGPT with symptoms of 194 distinct diseases. By comparing its predictions with expectations, we calculated a relative comparative score (RCS) to gauge accuracy.Results ChatGPT's GPT-4 model achieved an average RCS of 78.8%, outperforming the GPT-3.5-turbo by 10.5%. Some specialties scored above 90%.Discussion The test set, although extensive, was not exhaustive. Future studies should include a more comprehensive disease spectrum.Conclusion ChatGPT exhibits high accuracy in symptom checking for a broad range of diseases, showcasing its potential as a medical training tool in learning health systems to enhance care quality and address health disparities.
更多
查看译文
关键词
symptom checking,ChatGPT,benchmarking,learning health system,medical training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要