Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models
CoRR(2024)
摘要
Difficult-to-treat depression (DTD) has been proposed as a broader and more
clinically comprehensive perspective on a person's depressive disorder where
despite treatment, they continue to experience significant burden. We sought to
develop a Large Language Model (LLM)-based tool capable of interrogating
routinely-collected, narrative (free-text) electronic health record (EHR) data
to locate published prognostic factors that capture the clinical syndrome of
DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a
Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction
model. The resulting model is then able to extract and label spans related to a
variety of relevant positive and negative factors in real clinical data (i.e.
spans of text that increase or decrease the likelihood of a patient matching
the DTD syndrome). We show it is possible to obtain good overall performance
(0.70 F1 across polarity) on real clinical data on a set of as many as 20
different factors, and high performance (0.85 F1 with 0.95 precision) on a
subset of important DTD factors such as history of abuse, family history of
affective disorder, illness severity and suicidality by training the model
exclusively on synthetic data. Our results show promise for future healthcare
applications especially in applications where traditionally, highly
confidential medical data and human-expert annotation would normally be
required.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要