Simulations of sequence evolution: how (un)realistic they really are and why

Johanna Trost,Julia Haag, Dimitri Höhler, Luca Nesterenko,Laurent Jacob,Alexandros Stamatakis,Bastien Boussau

biorxiv(2023)

引用 0|浏览1
暂无评分
摘要
Motivation Simulating sequence evolution plays an important role in the development and evaluation of phylogenetic inference tools. Naturally, the simulated data needs to be as realistic as possible to be indicative of the performance of the developed tools on empirical data. Over the years, numerous phylogenetic sequence simulators, employing various models of evolution, have been published with the goal to simulate such empirical-like data. In this study, we simulated DNA and protein Multiple Sequence Alignments (MSAs) under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how well supervised learning methods are able to predict whether a given MSA is simulated or empirical. Results Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate the process of evolution. Data and Code Availability All simulated and empirical MSAs, as well as all analysis results, are available at . All scripts required to reproduce our results are available at and . Contact julia.haag{at}h-its.org ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要