Development and Evaluation of Aeyeconsult: A Novel Ophthalmology Chatbot Leveraging Verified Textbook Knowledge and GPT-4

Maxwell B. Singer, Julia J. Fu,Jessica Chow, Christopher C. Teng

JOURNAL OF SURGICAL EDUCATION（2024）

引用 0|浏览0

暂无评分

摘要

OBJECTIVE: There has been much excitement on the use of large language models (LLMs) such as ChatGPT in ophthalmology. However, LLMs are limited in that they are trained on unverified information and do not cite their sources. This paper highlights a new methodology to create a generative AI chatbot to answer eye care related questions which uses only verified ophthalmology textbooks as data and cites its sources. SETTING: Yale School of Medicine Department of Ophthalmology and Visual Science. DESIGN/METHODS: Aeyeconsult, an ophthalmology chatbot, was developed using GPT-4 (the LLM used to power the publicly available chatbot ChatGPT-4), LangChain, and Pinecone. Ophthalmology textbooks were processed into embeddings and stored in Pinecone. User queries were similarly converted, compared to stored embeddings, and GPT-4 generated responses. The interface was adapted from public code. Both Aeyeconsult and ChatGPT-4 were tested on the same 260 questions from OphthoQuestions.com, with the first response from Aeyeconsult and ChatGPT-4 recorded as the answer. RESULTS: Aeyeconsult outperformed ChatGPT-4 on the OKAP dataset, with 83.4% correct answers compared to 69.2% (p = 0.0118). Aeyeconsult also had fewer instances of no answer and multiple answers. Both systems performed best in General Medicine, with Aeyeconsult achieving 96.2% accuracy. Aeyeconsult's weakest performance was in Clinical Optics at 68.1%, but it still outperformed ChatGPT-4 in this category (45.5%). CONCLUSION: LLMs may be useful in answering ophthalmology questions but their trustworthiness and accuracy is limited due to training on unverified internet data and lack of source citation. We used a new methodology, using verified ophthalmology textbooks as source material and providing citations, to mitigate these issues, resulting in a chatbot more accurate than ChatGPT-4 in answering OKAPs style questions. (J Surg Ed 81:438-443. (c) 2023 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.)

查看译文

关键词

ChatGPT,artificial intelligence,chatbot,large language models,OKAPs

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要