A new framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile R package for paralogs and CNVs detection

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览9
暂无评分
摘要
Studies show that copy number variants (CNVs), due to their ubiquitous presence in eukaryotes, contribute to phenotypic variation, environmental adaptation, and fuel species divergence at a previously unknown rate. However, the detection of CNVs in genomes, especially in non-model organisms is challenging due to the need for costly genomic resources and complex computational infrastructure. Therefore, to provide researchers with a low-cost and easily accessible resource, we developed a robust statistical framework and an R software package to detect CNVs using allelic-read depth from SNPs data. The core of the framework exploits i) the allelic-read depth ratio distribution in heterozygotes for individual SNPs and testing it against an expected distribution under a binomial sampling, and ii) SNPs showing an apparent excess of heterozygotes under Hardy-Weinberg equilibrium, to detect alleles in putatively multi-copy regions. The use of multiple statistical tests to find the deviation in allelic-read depth ratio distribution makes our method sensitive to sampling and aware of reference biases thereby minimizing false detection of CNVs. Our framework is well-catered for high throughput short-reads data, hence, most GBS technologies (e.g., RADseq, Exome-capture, WGS). As such, it allows calling CNVs from genomes of varying complexity. The framework is implemented in the R package "rCNV" which effortlessly automates the analysis. We trained our models on simulated data and tested on four datasets obtained from different sequencing technologies (i.e., RADseq: Chinook salmon-Oncorhynchus tshawytscha, American lobster-Homarus americanus, Exome-capture: Norway Spruce-Picea abies, and WGS: Malaria mosquito-Anopheles gambiae). ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
single nucleotide polymorphism data,copy number variants,nucleotide polymorphism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要