Benchmarking the symptom-checking capabilities of ChatGPT for a broad range of diseases

Journal of the American Medical Informatics Association : JAMIA（2023）

引用 0|浏览1

暂无评分

摘要

Objective This study evaluates ChatGPT's symptom-checking accuracy across a broad range of diseases using the Mayo Clinic Symptom Checker patient service as a benchmark.Methods We prompted ChatGPT with symptoms of 194 distinct diseases. By comparing its predictions with expectations, we calculated a relative comparative score (RCS) to gauge accuracy.Results ChatGPT's GPT-4 model achieved an average RCS of 78.8%, outperforming the GPT-3.5-turbo by 10.5%. Some specialties scored above 90%.Discussion The test set, although extensive, was not exhaustive. Future studies should include a more comprehensive disease spectrum.Conclusion ChatGPT exhibits high accuracy in symptom checking for a broad range of diseases, showcasing its potential as a medical training tool in learning health systems to enhance care quality and address health disparities.

查看译文

关键词

symptom checking,ChatGPT,benchmarking,learning health system,medical training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要