Using Unsupervised Natural Language Processing to Automatically Identify Chronicity and Extent of Inflammation on Ileocolonoscopy From Pathology Reports

The American Journal of Gastroenterology(2023)

引用 0|浏览0
暂无评分
摘要
Introduction: Ascertaining disease location in inflammatory bowel disease requires manual review of unstructured pathology reports, which may vary in style and terminology. Natural Language Processing (NLP) technologies are used to extract data from free text format in the electronic medical record (EMR). We aimed to develop and validate an unsupervised NLP based algorithm to identify presence of ileal and/or colonic inflammation as well as differentiate acute from chronic inflammation from pathology reports within an integrated EMR system. Methods: We developed an unsupervised, rule-based regular expression NLP algorithm to identify keywords corresponding to the findings of acute or chronic ‘ileitis’, ‘colitis’, ‘crypt architectural distortion’ and ‘granulomas’ alongside a list of negation terms within pathology reports. The algorithm’s performance was evaluated in comparison to authors NSL, SB, and MK's interpretation of the contents of the same pathology reports. A portion of the pathology reports were reviewed by multiple authors to ensure adequate intra-observer agreement. The algorithm’s performance was calculated as accuracy, sensitivity, precision, and F-measure. Results: We queried 9508 pathology reports spanning a 36-month period and identified 649 reports with findings of acute or chronic inflammation on colonoscopy. The NLP algorithm demonstrated high accuracy in detecting acute colitis (93.5%), chronic colitis (80.2%), acute ileitis (96.4%), chronic ileitis (86.4%) and the presence of granulomas (98.7%) compared to manual review of pathology reports. Detailed performance across the variables studies is in Table 1. Conclusion: Unsupervised NLP approach identified the location and chronicity of inflammation from biopsies with high degree of accuracy. We expect our algorithm’s performance to improve further with the utilization of training sets with expert input. Application of this algorithm has the potential to improve patient identification to enhance research and clinical care across large EMRs. Table 1. - Performance characteristics of the NLP algorithm for detecting acute and chronic colitis, acute and chronic ileitis, and granulomas from pathology reports Variable (N) Accuracy (%) Sensitivity (%) Specificity (%) Precision (%) F-Measure Colitis Acute (n=31) 93.5 12.9 99.3 57.1 22.8 Chronic (n=411) 80.2 79.6 84.1 96.7 81.8 Ileitis Acute (n=19) 96.4 15.8 99.8 75 27.3 Chronic (n=137) 86.4 52.6 100 100 68.9 Granulomas (n=51) 98.7 90.2 98.1 85.2 94
更多
查看译文
关键词
ileocolonoscopy,unsupervised natural language processing,inflammation,pathology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要