Iterative deletion of gene trees detects extreme biases in distance-based phylogenomic coalescent analyses

biorxiv(2022)

引用 0|浏览7
暂无评分
摘要
Summary coalescent methods offer an alternative to the concatenation (supermatrix) approach for inferring phylogenetic relationships from genome-scale datasets. Given huge datasets, broad congruence between contrasting phylogenomic paradigms is often obtained, but empirical studies commonly show some well supported conflicts between concatenation and coalescence results and also between species trees estimated from alternative coalescent methods. Partitioned support indices can help arbitrate these discrepancies by pinpointing outlier loci that are unjustifiably influential at conflicting nodes. Partitioned coalescence support (PCS) recently was developed for summary coalescent methods, such as ASTRAL and MP-EST, that use the summed fits of individual gene trees to estimate the species tree. However, PCS cannot be implemented when distance-based coalescent methods (e.g., STAR, NJst, ASTRID, STEAC) are applied. Here, this deficiency is addressed by automating computation of ‘partitioned coalescent branch length’ (PCBL), a novel index that uses iterative removal of individual gene trees to assess the impact of each gene on every clade in a distance-based coalescent tree. Reanalyses of five phylogenomic datasets show that PCBL for STAR and NJst trees helps quantify the overall stability/instability of clades and clarifies disagreements with results from optimality-based coalescent analyses. PCBL scores reveal severe ‘missing taxa’, ‘apical nesting’, ‘misrooting’, and ‘basal dragdown’ biases. Contrived examples demonstrate the gross overweighting of outlier gene trees that drives these biases. Because of interrelated biases revealed by PCBL scores, caution should be exercised when using STAR and NJst, in particular when many taxa are analyzed, missing data are non-randomly distributed, and widespread gene-tree reconstruction error is suspected. Similar biases in the optimality-based coalescent method MP-EST indicate that congruence among species trees estimated via STAR, NJst, and MP-EST should not be interpreted as independent corroboration for phylogenetic relationships. Such agreements among methods instead might be due to the common defects of all three summary coalescent methods. ### Competing Interest Statement The authors have declared no competing interest. * ASTRAL : accurate species tree algorithm ASTRID : accurate species trees from internode distances bp : base pair ILS : incomplete lineage sorting ML : maximum likelihood MP-EST : maximum pseudo-likelihood for estimating species trees NJst : neighbor joining species tree PCS : partitioned coalescence support PP : posterior probability STAR : species tree estimation using average ranks of coalescences STEAC : species tree estimation using average coalescence times UCE : ultraconserved element
更多
查看译文
关键词
phylogenomic coalescent,gene trees,iterative deletion,extreme biases,distance-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要