A new framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile R package for paralogs and CNVs detection
bioRxiv (Cold Spring Harbor Laboratory)(2023)
摘要
Studies show that copy number variants (CNVs), due to their ubiquitous presence in eukaryotes, contribute to phenotypic variation, environmental adaptation, and fuel species divergence at a previously unknown rate. However, the detection of CNVs in genomes, especially in non-model organisms is challenging due to the need for costly genomic resources and complex computational infrastructure. Therefore, to provide researchers with a low-cost and easily accessible resource, we developed a robust statistical framework and an R software package to detect CNVs using allelic-read depth from SNPs data. The core of the framework exploits i) the allelic-read depth ratio distribution in heterozygotes for individual SNPs and testing it against an expected distribution under a binomial sampling, and ii) SNPs showing an apparent excess of heterozygotes under Hardy-Weinberg equilibrium, to detect alleles in putatively multi-copy regions. The use of multiple statistical tests to find the deviation in allelic-read depth ratio distribution makes our method sensitive to sampling and aware of reference biases thereby minimizing false detection of CNVs. Our framework is well-catered for high throughput short-reads data, hence, most GBS technologies (e.g., RADseq, Exome-capture, WGS). As such, it allows calling CNVs from genomes of varying complexity. The framework is implemented in the R package "rCNV" which effortlessly automates the analysis. We trained our models on simulated data and tested on four datasets obtained from different sequencing technologies (i.e., RADseq: Chinook salmon-Oncorhynchus tshawytscha, American lobster-Homarus americanus, Exome-capture: Norway Spruce-Picea abies, and WGS: Malaria mosquito-Anopheles gambiae).
### Competing Interest Statement
The authors have declared no competing interest.
更多查看译文
关键词
single nucleotide polymorphism data,copy number variants,nucleotide polymorphism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要