NER Sequence Embedding of Unified Medical Corpora to incorporate Semantic Intelligence in Big Data Healthcare Diagnostics

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Clinical diagnoses is a challenging task for which high expertise is required at doctors’ end. It is recognized that technology integration with clinical domain would facilitate the diagnostic process. Semantic understanding of medical domain and clinical context is needed to make intelligent analytics. These analytics need to learn the medical context for different purposes of diagnosing and treating patients. Traditional diagnoses are made through phenotype features from patients’ profile. It is also a known fact that diabetes mellitus (DM) is widely affecting the population and is a chronic disease that requires timely diagnosis. Motivation for this research comes from the gap found in discovering the common ground for medical context learning in analytics to diagnose DM and its comorbidity diseases. Therefore, a unified medical knowledge base is found significantly important to learn contextual Named Entity Recognition (NER) embedding for semantic intelligence. Our search for possible solutions for medical context learning told us that unified corpora tagged with medical terms was missing to train the analytics for diagnoses of DM and its comorbidities. Hence, we put effort in collecting endocrine diagnostic electronic health records (EHR) corpora for clinical purposes that is labeled with ICD-10-CM international coding scheme. International Codes for Diseases (ICD) by World Health Organization (WHO) is a known schema to represent medical codes for diagnoses. The complete endocrine EHR corpuses make DM-Comorbid-EHR-ICD-10 Corpora. DM-Comorbid-EHR-ICD-10 Corpora is tagged for understanding the medical context with uniformity. We experimented with different NER sequence embedding approaches using advanced ML integrated with NLP techniques. Different experiments used common frameworks like; Spacy, Flair, and TensorFlow, Keras. In our experiments albeit label sets in form of (instance, label) pair for diagnoses were tagged with Sequential() model found in TensorFlow.Keras using Bi-LSTM and dense layers. The maximum accuracy achieved was 0.9 for Corpus14407\_DM\_pts\_33185 with maximum number of diagnostic features taken as input. The sequential DNN NER model diagnostic accuracy increased as the size of corpus grew from 100 to 14407 DM patients suffering from comorbidity diseases. The significance of clinical notes and practitioner comments available as free text is clearly seen in the diagnostic accuracy.
更多
查看译文
关键词
unified medical corpora,big data healthcare diagnostics,semantic intelligence,ner sequence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要