PP54 Machine Learning For Accelerating Screening In Literature Reviews

Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald,Sara Graziadio

International Journal of Technology Assessment in Health Care（2023）

引用 0|浏览0

暂无评分

摘要

IntroductionSystematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.MethodsA pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).ResultsTen tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.ConclusionsEvaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要