The full set of potential open regions (PORs) in the human genome defined by consensus peaks of ATAC-seq data

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览12
暂无评分
摘要
Chromatin accessibility profiling methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been promoting the identification of gene regulatory elements and the characterization of epigenetic landscapes. Unlike gene expression data, there is no consistent reference for chromatin accessibility data, which hinders large-scale integration analysis. Based on a systematic analysis of 1,785 ATAC-seq and 231 scATAC-seq datasets, we found that cells share the same set of potential open regions (PORs) on the genome. We proposed a unified reference called consensus peaks (cPeaks) to represent PORs across all observed cell types, and developed a deep-learning model to predict cPeaks unseen in the collected data. The observed and predicted cPeaks defined a full set of PORs in the human genome, which can be used as a reference for all ATAC-seq data to align to. Experiments showed that using this reference to integrate scATAC-seq data can improve cell annotation and facilitate the discovery of rare cell types. cPeaks also performed well in analyzing dynamic biological processes and diseases. The analyses and experiments suggested PORs represent a set of inherent functional regions in the human genome. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
human genome,potential open regions,consensus peaks,atac-seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要