A transparent deep learning approach to identify autism spectrum disorders (ASD) in clinical notes and overcome obstacles to machine learning with small datasets. (Preprint)

Gondy Leroy, Madison Preece, Ajay Jaswani, Hyunju Song,Jennifer G. Andrews,Maureen Kelly Galindo,Yang Gu, Sydney Rice

crossref(2023)

引用 0|浏览0
暂无评分
摘要
BACKGROUND Machine learning (ML) is increasingly employed to predict risk for a variety of medical conditions. Commonly, large datasets are created using electronic health records (EHR) and algorithms assign a single, often binary, label. No rationale is provided for the label and performance is usually poor when the datasets are small. OBJECTIVE Our approach demonstrates the feasibility of using small datasets with deep ML by training first on intermediate steps where redundant information is available. Our focus is on autism spectrum disorders (ASD), a neurodevelopmental condition. We use unstructured data from clinical notes and label individual sentences with criterion labels as described in the Diagnostic and Statistical Manual of Mental Disorders (DSM5). These are then combined into a final case label. Our goal is not to list an extensive set of ML algorithms showing incremental improvements. Instead, we demonstrate how focusing on intermediate steps allows us to work with small datasets, resulting in excellent outcomes, further adding a transparent decision process. METHODS We trained our algorithms on 200 cases containing 34,313 sentences and tested on 35 cases containing 6,773 sentences. We compare one rule-based and three ML algorithms (one Bidirectional Gated Recurrent Unit (BiGRU) and two Bidirectional Long Short-Term Memory (BiLSTM) neural networks). We also tested six ensembles of these algorithms containing all algorithms, all ML algorithms, and the two best-performing algorithms, by using a majority vote or inclusive-or to combine the individual algorithm output. The final case label (ASD or no ASD) is assigned when the required pattern of DSM5 criteria is present for a case. RESULTS Using the F-Measure as the comparison point, the best algorithm for criteria labeling of sentences was a multilabel BiLSTM which achieved, on average for the seven criteria, 67% precision, 51% recall, and 0.57 F-measure. The best ensemble combined all ML algorithms using a majority vote. It achieved on average 70% precision, 51% recall, and a 0.58 F-measure. A case was labeled as ASD when all three A-criteria and at least two different B-criteria were present. The best outcome was achieved by the ensemble of the two best-performing algorithms using a majority vote. It achieved 100% precision (or PPV), 83% recall (or sensitivity), 100% specificity, 91% accuracy, and 0.91 F-measure, CONCLUSIONS Our approach shows how ML can be applied in a low-resource domain. Because EHRs, in general, will contain redundant information, outcomes for criterion labeling are permitted to be lower. When redundancies are combined, the final case labeling achieves excellent performance. Our approach has added value due to transparency demonstrating the individual criteria that contributed to the case label. A trained clinician can interpret the labels, a feature lacking in black-box approaches common used with neural networks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要