A pan-tissue, pan-disease compendium of human orphan genes

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Species-specific genes are ubiquitous in evolution, with functions ranging from prey paralysis to survival in subzero temperatures. Because they are typically expressed under limited conditions and lack canonical features, such genes may be vastly under-identified, even in humans. Here, we leverage terabytes of human RNA-Seq data to identify thousands of highly-expressed transcripts that do not correspond to any Gencode-annotated gene. Many may be novel ncRNAs although 80% of them contain ORFs that have the potential of encoding proteins unique to Homo sapiens (orphan genes). We validate our findings with independent strand-specific and single-cell RNA-seq datasets. Hundreds of these novel transcripts overlap with deleterious genomic variants; thousands show significant association with disease-specific patient survival. Most are dynamically regulated and accumulate selectively in particular tissues, cell-types, developmental stages, tumors, COVID-19, sex, and ancestries. As such, these transcripts hold potential as diagnostic biomarkers or therapeutic targets. To empower future discovery, we provide a compendium of these huge RNA-Seq expression data, and RiboSeq data, with associated metadata. Further, we supply the gene models for the novel genes as UCSC Genome Browser tracks. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要