Comparative genomics of the human genome and six bat genomes using AI: Mb-level CpG and TFBS islands

Yuki Iwasaki,Toshimichi Ikemura, Kennosuke Wada, Yoshiko Wada,Takashi Abe

semanticscholar(2022)

引用 0|浏览0
暂无评分
摘要
BackgroundEmerging infectious disease RNA viruses, such as the SARS-CoV-2 and Ebola viruses, are thought to rely on bats as their natural reservoir hosts. Since these zoonotic viruses pose a great threat to humans, it should be important to characterize the bat genome from multiple perspectives. Unsupervised artificial intelligence (AI) methods extracting novel information from big sequence data without prior knowledge or particular models are highly desirable for obtaining unexpected insights. We previously established a batch-learning self-organizing map (BLSOM) of the oligonucleotide composition that reveals novel genome characteristics from big sequence data.ResultsIn this study, using the oligonucleotide BLSOM, we conducted a comparative genome study of humans and bats. BLSOM is an explainable-type AI that reveals the diagnostic oligonucleotides contributing to sequence clustering (self-organization). When the unsupervised AI reveals unexpected and/or characteristic features, these features can be studied in more detail via the much simpler and more direct standard distribution map method. Based on this combined strategy, we identified the Mb-level enrichment of CG (Mb-level CpG islands) around the termini of bat long-scaffold sequences. In addition, a class of CG-containing oligonucleotides were enriched also in the centromeric and pericentromeric regions of human chromosomes. Oligonucleotides longer than tetranucleotides often represent binding motifs for a wide variety of proteins, e.g., transcription factor binding sequences (TFBSs). By analyzing penta- and hexanucleotide compositions, we observed the evident enrichment of a wide range of hexanucleotide TFBSs in centromeric and pericentromeric heterochromatin regions on all human chromosomes.ConclusionTFBSs which are enriched in centromere and pericentromeric heterochromatin regions, are thought to play an important role in the formation of nuclear 3D structures. Our AI-based analysis should help us to understand differential features of nuclear 3D structures of the human and bat genomes.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要