Estimating amino acid substitution models from genome datasets: a simulation study on the performance of estimated models

Nguyen Huy Tinh,Cuong Cao Dang,Le Sy Vinh

JOURNAL OF EVOLUTIONARY BIOLOGY(2024)

引用 0|浏览2
暂无评分
摘要
Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences. Graphical Abstract
更多
查看译文
关键词
amino acid substitution models,time-reversible models,time-nonreversible models,maximum likelihood estimation methods,simulated amino acid data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要