Using Machine Learning To Collect And Facilitate Remote Access To Biomedical Databases: Development Of The Biomedical Database Inventory

JMIR MEDICAL INFORMATICS(2021)

引用 0|浏览301
暂无评分
摘要
Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases.Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly.Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection.Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to "omics" and the other related to the COVID-19 pandemic.Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).
更多
查看译文
关键词
biomedical databases, natural language processing, deep learning, internet, biomedical knowledge
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要