Tissue of origin detection for cancer tumor using low-depth cfDNA samples through combination of tumor-specific methylation atlas and genome-wide methylation density in graph convolutional neural networks
biorxiv(2023)
摘要
Background Cell free DNA (cfDNA)-based assays hold great potential in detecting early cancer signals yet determining the tissue-of-origin (TOO) for cancer signals remains a challenging task. Here, we investigated the contribution of a methylation atlas to TOO detection in low depth cfDNA samples.
Methods We constructed a tumor-specific methylation atlas (TSMA) using whole-genome bisulfite sequencing (WGBS) data from five types of tumor tissues (breast, colorectal, gastric, liver and lung cancer) and paired white blood cells (WBC). TSMA was used with a non-negative least square matrix factorization (NNLS) deconvolution algorithm to identify the abundance of tumor tissue types in a WGBS sample. We showed that TSMA worked well with tumor tissue but struggled with cfDNA samples due to the overwhelming amount of WBC-derived DNA. To construct a model for TOO, we adopted the multi-modal strategy and used as inputs the combination of deconvolution scores from TSMA with other features of cfDNA.
Results Our final model comprised of a graph convolutional neural network using deconvolution scores and genome-wide methylation density features, which achieved an accuracy of 69% in a held-out validation dataset of 239 low-depth cfDNA samples.
Conclusions In conclusion, we have demonstrated that our TSMA in combination with other cfDNA features can improve TOO detection in low-depth cfDNA samples.
### Competing Interest Statement
THN, NHN, HG, LST, MDP receive compensation and have an equity interest in Gene Solutions. NNTD, THT, VTCN, THHN, GTHN are employees of Gene Solutions. The authors ensure that this does not alter the accuracy or integrity of the manuscript. The study was funded by Gene Solutions. The sponsor has no role in the analysis of the data and the preparation of the manuscript.
* cfDNA
: Cell free DNA
TOO
: Tissue-of-origin
TSMA
: tumor-specific methylation atlas
WBC
: white blood cells
NNLS
: non-negative least square matrix factorization
GCNN
: graph convolutional neural network
GWMD
: genome-wide methylation density
TMD
: targeted region methylation density
GWFP
: genome-wide fragmentation profile
EM
: end-motif
CNA
: copy number aberration
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要