Virsearcher: Identifying Bacteriophages from Metagenomes by Combining Convolutional Neural Network and Gene Information

IEEE/ACM Transactions on Computational Biology and Bioinformatics(2023)

引用 4|浏览15
暂无评分
摘要
Metagenome sequencing provides an unprecedented opportunity for the discovery of unknown microbes and viruses. A large number of phages and prokaryotes are mixed together in metagenomes. To study the influence of phages on human bodies and environments, it is of great significance to isolate phages from metagenomes. However, it is difficult to identify novel phages because of the diversity of their sequences and the frequent presence of short contigs in metagenomes. Here, virSearcher is developed to identify phages from metagenomes by combining the convolutional neural network (CNN) and the gene information of input sequences. Firstly, an input sequence is encoded in accordance with the different functions of its coding and the non-coding regions and then is converted into word embedding code through a word embedding layer before a convolutional layer. Meanwhile, the hit ratio of the virus genes is combined with the output of the CNN to further improve the performance of the network. The genes used by virSearcher consist of complete and incomplete genes. Experiments on several metagenomes have showed that, compared with others, virSearcher can significantly improve the performance for the identification of short sequences, while maintaining the performance for long ones. The source code of virSearcher is freely available from http://github.com/DrJackson18/virSearcher .
更多
查看译文
关键词
Convolutional neural network,gene related feature,incomplete gene,novel phage identification,word embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要