Distilling a High Precision Drug Adverse Effect Benchmark Using Wikipedia’s Wisdom of the Crowd (Preprint)

crossref(2022)

引用 0|浏览0
暂无评分
摘要
BACKGROUND Computational methods for identifying adverse events following medical interventions, require accurate and relevant “gold standard” for evaluation and comparison. An appealing possibility is to use datasets of adverse effects of drugs or vaccines. The available large datasets typically attain their size, by relying on automatic, or semi-automatic, methods for generation. This often comes with a compromise on the precision of the generated data, which can be at least partially alleviated by having experts curate the data. However, such a curation tends to be costly and time consuming. OBJECTIVE To improve accuracy without resorting to a manual curation, we aim to automatically extract the expert knowledge accumulated in Wikipedia, using Natural Language Processing technology. METHODS To curate a dataset of adverse drug effects (ADEs), we suggest retrieving the Wikipedia page associated with the drug, and checking whether the ADE appears in the sections describing adverse effects. RESULTS We use this method to distill two large adverse drug effect datasets – SIDER and OFFSIDES – and evaluate the obtained datasets versus their originating ones over two small ground-truth sets. For example, distilling the SIDER dataset to 7.2% of its size, and evaluating the result on a ground-truth derived from clinical trials, improves precision from 0.14 to 0.86, and F1-score from 0.23 to 0.75. CONCLUSIONS To accurately evaluate and compare algorithms which infer drug-ADE relations, there is a need for high-precision benchmark sets. The method suggested here is shown to yield such sets, and may be of more general use in distilling medical data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要