A unified hypothesis-free feature extraction framework for diverse epigenomic data

Ali Tuğrul Balcı,Maria Chikina

biorxiv(2024)

引用 0|浏览3
暂无评分
摘要
Motivation Epigenetic assays using next-generation sequencing (NGS) have furthered our understanding of the functional genomic regions and the mechanisms of gene regulation. However, a single assay produces billions of data represented by nucleotide resolution signal tracks. The signal strength at a given nucleotide is subject to numerous sources of technical and biological noise and thus conveys limited information about the underlying biological state. In order to draw biological conclusions, data is typically summarized into higher order patterns. Numerous specialized algorithms for summarizing epigenetic signal have been proposed and include methods for peak calling or finding differentially methylated regions. A key unifying principle underlying these approaches is that they all leverage the strong prior that signal must be locally consistent. Results We propose L segmentation as a universal framework for extracting locally coherent signals for diverse epigenetic sources. L serves to both compress and smooth the input signal by approximating it as piece-wise constant. We implement a highly scalable L segmentation with additional loss functions designed for NGS epigenetic data types including Poisson loss for single tracks and binomial loss for methylation/coverage data. We show that the L segmentation approach retains the salient features of the data yet can identify subtle features, such as transcription end sites, missed by other analytic approaches. Availability Our approach is implemented as an R package “l01segmentation” with a C++ backend. Available at . ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要