Methodological insights into ChatGPT’s screening performance in systematic reviews

BMC Medical Research Methodology(2024)

引用 0|浏览0
暂无评分
摘要
The screening process for systematic reviews and meta-analyses in medical research is a labor-intensive and time-consuming task. While machine learning and deep learning have been applied to facilitate this process, these methods often require training data and user annotation. This study aims to assess the efficacy of ChatGPT, a large language model based on the Generative Pretrained Transformers (GPT) architecture, in automating the screening process for systematic reviews in radiology without the need for training data. A prospective simulation study was conducted between May 2nd and 24th, 2023, comparing ChatGPT’s performance in screening abstracts against that of general physicians (GPs). A total of 1198 abstracts across three subfields of radiology were evaluated. Metrics such as sensitivity, specificity, positive and negative predictive values (PPV and NPV), workload saving, and others were employed. Statistical analyses included the Kappa coefficient for inter-rater agreement, ROC curve plotting, AUC calculation, and bootstrapping for p-values and confidence intervals. ChatGPT completed the screening process within an hour, while GPs took an average of 7–10 days. The AI model achieved a sensitivity of 95
更多
查看译文
关键词
Systematic review,ChatGPT,AI,Large language model,Article screening,Radiology,GPT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要