A systematic review of the application of machine learning methods for patient recruitment through electronic health records (Preprint)

semanticscholar(2021)

引用 0|浏览0
暂无评分
摘要
BACKGROUND Electronic health records (EHRs) provide potential for more efficient patient recruitment into clinical studies. In recent years, machine learning techniques have gained increasing popularity in EHR-based research, with several studies reporting that machine learning methods can be used for disease diagnosis, outcome prediction, and treatment personalization. OBJECTIVE To explore which machine learning methods have been utilized for recruiting patients through EHRs and to compare their characteristics and outcomes. METHODS Search was conducted in MEDLINE, Embase, Scopus and OpenGrey for papers published before June 20th, 2019. Both relevant subject headings and relevant terms in title, abstract and keywords were incorporated in the search strategy. Two reviewers screened papers and decided on which ones to include and then extracted data and assessed bias for each paper included independently. Conflicts were resolved through a third reviewer. Included studies were compared in terms of year of publication, study location, study type, EHRs setting, study aim, clinical trial domain, size of the dataset, models or methods, data types, data processing and feature selection, evaluation, outcomes. RESULTS 11 papers were included for synthesis. Ten were in-silico studies which simulated prediction of participant recruitment on computers. One study evaluated the machine learning assisted recruitment procedure in a real clinical setting. The in-silico studies covered diverse empirical frameworks in terms of the number of trials they used, trial domains, analysis unit, the size of the dataset, outcome definition, methods of data pre-processing, model building and evaluation, and performance measures. Different machine learning methods seem to have similar performance when evaluated under the same circumstances. NLP-incorporated similarity comparison appears more likely to generate better performance than similarity comparison applied to structured data alone in a similar experiment set-up. A single performance measure is not sufficient to fully evaluate a method. All the in-silico studies were judged at high risk of bias. The sole interventional study reported that a significant smaller percentage of time was spent on electronic screening using the system compared with not using it (P < 0.001), but a critical risk of bias was assigned to this study in the bias assessment phase. CONCLUSIONS Natural language processing techniques might be able to boost the accuracy of identifying eligible participants. However, complex methods impose more requirements on the availability of patient data and require customization of the methods to the target EHR system due to high risk of overfitting. It would be valuable if future in-silico studies in this domain could provide more details on data pre-processing, modelling and evaluation and could report different types of performance measures. More interventional studies are needed to generate higher quality evidence. CLINICALTRIAL Review registration: PROSPERO CRD42018103355
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要