Bridging information gaps in menopause status classification through natural language processing

JAMIA OPEN(2024)

引用 0|浏览3
暂无评分
摘要
Objective To use natural language processing (NLP) of clinical notes to augment existing structured electronic health record (EHR) data for classification of a patient's menopausal status.Materials and methods A rule-based NLP system was designed to capture evidence of a patient's menopause status including dates of a patient's last menstrual period, reproductive surgeries, and postmenopause diagnosis as well as their use of birth control and menstrual interruptions. NLP-derived output was used in combination with structured EHR data to classify a patient's menopausal status. NLP processing and patient classification were performed on a cohort of 307 512 female Veterans receiving healthcare at the US Department of Veterans Affairs (VA).Results NLP was validated at 99.6% precision. Including the NLP-derived data into a menopause phenotype increased the number of patients with data relevant to their menopausal status by 118%. Using structured codes alone, 81 173 (27.0%) are able to be classified as postmenopausal or premenopausal. However, with the inclusion of NLP, this number increased 167 804 (54.6%) patients. The premenopausal category grew by 532.7% with the inclusion of NLP data.Discussion By employing NLP, it became possible to identify documented data elements that predate VA care, originate outside VA networks, or have no corresponding structured field in the VA EHR that would be otherwise inaccessible for further analysis.Conclusion NLP can be used to identify concepts relevant to a patient's menopausal status in clinical notes. Adding NLP-derived data to an algorithm classifying a patient's menopausal status significantly increases the number of patients classified using EHR data, ultimately enabling more detailed assessments of the impact of menopause on health outcomes. Menopause is a significant milestone in women's health but using it in research is difficult because significant differences in data available to each study, particularly if the study is using historical data like those found in electronic health record (EHR) systems. In this article, we use natural language processing (NLP) on clinical notes in the EHR to augment other forms of EHR data, such as records of diagnoses or surgical procedures in order to classify patients as premenopausal or postmenopausal. Our NLP identified a patient's last menstrual period, any reproductive surgeries, and explicit diagnoses of menopause recorded in text. We evaluated our NLP algorithm on the EHR data of 307 512 female Veterans at the US Department of Veterans Affairs. Using only structured data, we could only classify 81 173 (27.0%) of the patients as premenopausal or postmenopausal, but after utilizing NLP on patients' clinical notes, 167 804 (54.6%) were classified. Doubling the number of patients able to be classified using EHR data allows future research to have more detailed analysis of the impact of menopause on women's health outcomes.
更多
查看译文
关键词
natural language processing,menopause,phenotyping,women's health
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要