A data-driven evaluation ofArabidopsis-centric research and the model species concept

Sourabh Palande, John Arsenault, Patricia Basurto-Lozada,Andrew Bleich, Brianna N.I. Brown, Sophia F Buysse, Noelle A Connors, Sondipon Adhikari,Kara C. Dobson,Francisco Xavier Guerra-Castillo, Maria F Guerrero-Carrillo, Stephen J. Harlow, Hector Herrera-Orozco, Asia T Hightower,Paulo Izquierdo,MacKenzie Jacobs,Nan E. Johnson,Wendy Leuenberger, Alejandro López-Hernández, Alicia Luckie-Duque, Camila Martinez-Avila,Eddy Mendoza-Galindo, David Cruz Plancarte, Jörg Schuster, Harry Shomer, Sidney C Sitar,Anne K Steensma, J. Thomson, Damián Villaseñor-Amador,Robin Waterman, Brandon Webster, Mkb Whyte, Sofía Zorrilla‐Azcué,Beronda L. Montgomery,Aman Y. Husbands,Arjun Krishnan,Sarah Percival,Elizabeth Munch,Robert VanBuren,Daniel H. Chitwood,Alejandra Rougon‐Cardoso

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览4
暂无评分
摘要
ABSTRACT The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science, firmly establishing the cornerstone of modern plant molecular biology. Competing frameworks to select an agricultural- or ecological-based model species, or to decentralize plant science and study a multitude of diverse species, were selected against in favor of building core knowledge in a species that would facilitate genome-enabled research that could assumedly be transferred to other plants. Here, over twenty years after sequencing the Arabidopsis genome and during which time sequencing data from other plant species has accumulated and computation enabling machine learning has evolved, we critically examine the ability of models based on Arabidopsis gene expression data to predict tissue identity in other flowering plant species. Comparing different machine learning algorithms, models trained and tested on Arabidopsis data achieved precision and recall values of up to 0.99 using the K-Nearest Neighbor method, whereas when tissue identity is predicted across the flowering plants using models trained on Arabidopsis data, precision values range from 0.70 to 0.75 and recall from 0.55 to 0.64, depending on the algorithm used. Below-ground tissue is more predictable than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from Arabidopsis . Our data-driven results suggest that, in hindsight, the assertion that knowledge from Arabidopsis is translatable to other plants is not as strong as originally assumed, and that in the current era where sequencing data and computation abound, we should decentralize the scientific focus on Arabidopsis and embrace plant diversity.
更多
查看译文
关键词
of<i>arabidopsis</i>-centric,model species,research,data-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要