Text Mining Drug-Protein Interactions using an Ensemble of BERT, Sentence BERT and T5 models

biorxiv(2021)

引用 0|浏览1
暂无评分
摘要
In this work, we trained an ensemble model for predicting drug-protein interactions within a sentence based on only its semantics. Our ensembled model was built using three separate models: 1) a classification model using a fine-tuned BERT model; 2) a fine-tuned sentence BERT model that embeds every sentence into a vector; and 3) another classification model using a fine-tuned T5 model. In all models, we further improved performance using data augmentation. For model 2, we predicted the label of a sentence using k-nearest neighbors with its embedded vector. We also explored ways to ensemble these 3 models: a) we used the majority vote method to ensemble these 3 models; and b) based on the HDBSCAN clustering algorithm, we trained another ensemble model using features from all the models to make decisions. Our best model achieved an F-1 score of 0.753 on the BioCreative VII Track 1 test dataset. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
sentence bert,drug-protein
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要