Data Augmentation for BERT in the Medication Extraction Task of BioCreative VII

semanticscholar(2021)

引用 1|浏览1
暂无评分
摘要
Identifying medical entities such as disease and medications mentioned in short, informal, and noisy in social media text is challenging. We participated the track 3 of the BioCreative VII challenge with the goal to extract the mentions of medications or dietary supplements in tweets. We use different solutions based on BERT and BiLSTM (bidirectional long shortterm memory) to develop our system under highly unbalanced data distribution. Four systems were developed for the task including the original BERT fine-tuning on the official training set, BERT with data augmentation (BERT-DA), BiLSTM, and BiLSTM with the focal loss. Owning to the limit of time for producing the predictions of the testing set, we only submitted two results (BERT and BERT-DA) for evaluation. The best performed model we submitted is the BERT-DA, which obtained an F1-score of 70.4%. From the evaluation results, we confirmed the effectiveness of the proposed data augmentation method, which can greatly improve the recall of the developed system. Keywords—Social media; named entity recognition; data imbalance
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要