Remote homolog detection places insect chemoreceptors in a cryptic protein superfamily spanning the tree of life

biorxiv(2023)

引用 0|浏览0
暂无评分
摘要
Many proteins exist in the so-called “twilight zone” of sequence alignment, where low pairwise sequence identity makes it difficult to determine homology and phylogeny [1][1], [2][2]. As protein tertiary structure is often more conserved [3][3], recent advances in ab initio protein folding have made structure-based identification of putative homologs feasible [4][4]–[6][5]. However, structural screening and phylogenetics are in their infancy, particularly for twilight zone proteins. We present a pipeline for the identification and characterization of distant homologs, and apply it to 7-transmembrane domain ion channels (7TMICs), a protein group founded by insect Odorant and Gustatory receptors. Previous sequence and limited structure-based searches identified putatively-related proteins, mainly in other animals and plants [7][6]–[10][7]. However, very few 7TMICs have been identified in non-animal, non-plant taxa. Moreover, these proteins’ remarkable sequence dissimilarity made it uncertain if disparate 7TMIC types (Gr/Or, Grl, GRL, DUF3537, PHTF and GrlHz) are homologous or convergent, leaving their evolutionary history unresolved. Our pipeline identified thousands of new 7TMICs in archaea, bacteria and unicellular eukaryotes. Using graph-based analyses and protein language models to extract family-wide signatures, we demonstrate that 7TMICs have structure and sequence similarity, supporting homology. Through sequence and structure-based phylogenetics, we classify eukaryotic 7TMICs into two families (Class-A and Class-B), which are the result of a gene duplication predating the split(s) leading to Amorphea (animals, fungi and allies) and Diaphoretickes (plants and allies). Our work reveals 7TMICs as a cryptic superfamily with origins close to the evolution of cellular life. More generally, this study serves as a methodological proof of principle for the identification of extremely distant protein homologs. ### Competing Interest Statement The authors have declared no competing interest. [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-6 [6]: #ref-7 [7]: #ref-10
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要