A natural language processing system for the efficient updating of highly curated pathophysiology mechanism knowledge graphs

Negin Sadat Babaiha, Hassan Elsayed,Bide Zhang, Abish Kaladharan, Priya Sethumadhavan,Bruce Schultz, Jürgen Klein, Bruno Freudensprung,Vanessa Lage-Rupprecht,Alpha Tom Kodamullil,Marc Jacobs,Stefan Geissler,Sumit Madan,Martin Hofmann-Apitius

Artificial Intelligence in the Life Sciences(2023)

引用 0|浏览10
暂无评分
摘要
Biomedical knowledge graphs (KG) have become crucial for describing biological findings in a structured manner. To keep up with the constantly changing flow of knowledge, their embedded information must be regularly updated with the latest findings. Natural language processing (NLP) has created new possibilities for automating this upkeep by facilitating information extraction from free text. However, due to annotated and labeled biomedical data limitations, the development of completely autonomous information extraction systems remains a substantial scientific and technological hurdle. This study aims to explore methodologies best suited to support the automatic extraction of causal relationships from biomedical literature with the aim of regular and rapid updating of disease-specific pathophysiology mechanism KGs. Our proposed approach first searches and retrieves PubMed abstracts using the desired terms and keywords. The extension corpora are then passed through the NLP pipeline for automatic information extraction. We then identify triples representing cause-and-effect relationships and encode this content using the Biological Expression Language (BEL). Finally, domain experts perform an analysis of the completeness, relevance, accuracy, and novelty of the extracted triples. In our test scenario, which is focused on the KG regarding the phosphorylation of the Tau protein, our pipeline successfully contributed novel data, which was then subsequently used to update the KG leading to the identification of six additional upstream regulators of Tau phosphorylation. Here, it is demonstrated that the NLP-based workflow we created is capable of rapidly updating pathophysiology mechanism graphs. As a result, production-scale, semi-automated updating of pre-existing, curated mechanism graphs is enabled.
更多
查看译文
关键词
Knowledge graphs,Relation extraction,Natural language processing,Biomedical text mining,Biological expression language (BEL),Human brain pharmacome (HBP)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要