Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models

Journal of Healthcare Informatics Research(2024)

引用 0|浏览0
暂无评分
摘要
Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best F1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best F1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best F1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall F1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports.
更多
查看译文
关键词
Pulmonary nodule,Nodule characteristics,Natural language processing,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要