Noise Reduction Learning Based on XLNet-CRF for Biomedical Named Entity Recognition

IEEE/ACM Transactions on Computational Biology and Bioinformatics(2023)

引用 5|浏览42
暂无评分
摘要
In recent years, Biomedical Named Entity Recognition (BioNER) systems have mainly been based on deep neural networks, which are used to extract information from the rapidly expanding biomedical literature. Long-distance context autoencoding language models based on transformers have recently been employed for BioNER with great success. However, noise interference exists in the process of pre-training and fine-tuning, and there is no effective decoder for label dependency. Current models have many aspects in need of improvement for better performance. We propose two kinds of noise reduction models, Shared Labels and Dynamic Splicing, based on XLNet encoding which is a permutation language pre-training model and decoding by Conditional Random Field (CRF). By testing 15 biomedical named entity recognition datasets, the two models improved the average F1-score by 1.504 and 1.48, respectively, and state-of-the-art performance was achieved on 7 of them. Further analysis proves the effectiveness of the two models and the improvement of the recognition effect of CRF, and suggests the applicable scope of the models according to different data characteristics.
更多
查看译文
关键词
BioNLP,biomedical named entity recognition,noise reduction model,permutation language model,XLNet,conditional random field
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要