A Comprehensive Approach Characterizing Fusion Proteins and Their Interactions Using Biomedical Literature

bioRxiv(2018)

引用 1|浏览7
暂无评分
摘要
Todays increase in scientific literature requires the efficient methods of data mining for improving the extraction of the useful information from texts. In this manuscript, we used a data and text mining method to identify fusions and their protein-protein interactions from published biomedical text. The extracted fusion proteins and their protein-protein interactions are used as a training set for a Naive Bayes classifier that is further used for final identification of testing dataset, consisting of 1817 fusions. Our method has a literature corpus, text and annotation mappers; keywords, rule bases, negative tokens, and pattern extractor; synonym tagger, normalization, regular expression mapper; and Naive Bayes classifier. We classified 1817 unique fusion proteins and their corresponding 2908 protein-protein interactions for 18 cancer types. Therefore, it can be used for screening literature for identifying mentions unique cases of fusions that can be further used for downstream analysis. It is available at http://protfus.md.biu.ac.il/.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要