A novel fusion technology utilizing complex network and sequence information for FAD-binding site identification

Lichao Zhang, Kang Xiao, Xueting Wang,Liang Kong

ANALYTICAL BIOCHEMISTRY(2024)

引用 0|浏览0
暂无评分
摘要
Flavin adenine dinucleotide (FAD) binding sites play an increasingly important role as useful targets for inhibiting bacterial infections. To reveal protein topological structural information as a reasonable complement for the identification FAD-binding sites, we designed a novel fusion technology according to sequence and complex network. The specially designed feature vectors were combined and fed into CatBoost for model construction. Moreover, due to the minority class (positive samples) is more significant for biological researches, a random under-sampling technique was applied to solve the imbalance. Compared with the previous methods, our methods achieved the best results for two independent test datasets. Especially, the MCC obtained by FADsite and FADsite_seq were 14.37 %-53.37 % and 21.81 %-60.81 % higher than the results of existing methods on Test6; and they showed improvements ranging from 6.03 % to 21.96 % and 19.77 %-35.70 % on Test4. Meanwhile, statistical tests show that our methods significantly differ from the state-of-the-art methods and the cross-entropy loss shows that our methods have high certainty. The excellent results demonstrated the effectiveness of using sequence and complex network information in identifying FAD-binding sites. It may be complementary to other biological studies. The data and resource codes are available at https://github.com/Kangxi aoneuq/FADsite.
更多
查看译文
关键词
FAD -binding site,Sequence information,Complex network,Random under -sampling technique,CatBoost classifier
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要