Automatically pre-screening patients for the rare disease aromatic l-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population

Aaron M. Cohen, Jolie Kaner, Ryan Miller, Jeffrey W. Kopesky,William Hersh

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION(2024)

引用 0|浏览1
暂无评分
摘要
Objectives: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic l-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease.Materials and Methods: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value.Results: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001.Conclusion: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.
更多
查看译文
关键词
machine learning,rare diseases,natural language processing,EHR data secondary uses,aromatic <sc>l-</sc>amino acid decarboxylase deficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要