Methods for reducing the number of sequences in molecular evolutionary analyses

Meta Gene(2020)

引用 2|浏览12
暂无评分
摘要
Due to the progress in the sequencing technology, the number of nucleotide sequences for pathogens deposited in the public databases has been increasing rapidly. Consequently, in the molecular evolutionary analyses of pathogens, it may occasionally be difficult to include all the available sequences and necessary to reduce the number of sequences to accomplish computation within a realistic time frame. Here several methods for reducing the number of sequences were evaluated using the amount of evolutionary information contained in the retained sequences, which was measured as the total branch length of the phylogenetic tree (L). In the REA (random elimination in alignment) method, each of sequences was eliminated with equal probability. In the phylogenetic tree-based methods, the sequences associated with short exterior branches were eliminated; the sequences to be eliminated were required to constitute neighbors with another sequence in the CNT (closest neighbor in tree) method, whereas no such restriction was imposed in the SET (shortest exterior branch in tree) method. In the distance matrix-based methods, the sequences with small average distances to other sequences were eliminated; the sequences to be eliminated were required to be closely related to another sequence in the CPM (closest pair in matrix) method, whereas no such restriction was imposed in the SDM (smallest average distance in matrix) method. From the analyses of 2113 sequences for viral protein 1 of norovirus and 13,063 sequences for hemagglutinin of influenza A virus, it was observed that the CPM method was the most useful to obtain large L, in which the exterior branch length (LE) tended to be elongated. In contrast, the interior branch length (LI) tended to be elongated such that the LI/LE was heightened in the SDM method, which may be suitable for the phylogenetic analysis. The nucleotide diversity (π), the synonymous diversity (πS), the nonsynonymous diversity (πN), and the πN/πS were almost constant in the REA method, whereas they increased in other methods, suggesting that the REA method may be appropriate for the analyses of population diversity and natural selection.
更多
查看译文
关键词
Distance matrix,Multiple alignment,Nucleotide diversity,Phylogenetic tree,Total branch length
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要