Genome-wide exploratory analysis for NARAC dataset with preparation for haplotype block partitioning through minor allele frequency quality control viewpoint

Mohamed N. Saad, Galena W. Zareef,Fatma S. Ibrahim,Ashraf M. Said, Hisham F. A. Hamed

Iran J. Comput. Sci.(2023)

引用 0|浏览1
暂无评分
摘要
This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the percentage of missing data. The number of alleles and genotypes is also counted, and the minor allele frequency (MAF) is calculated for each single nucleotide polymorphism (SNP). The data is further classified into four categories based on the SNP's MAF, namely, very rare, rare, low frequency, and common SNPs. The regions of these categories in the chromosome are investigated to determine the proportion of SNPs in coding locations and other regions. It is observed that each category has a different proportion in each region of consequence annotation. The data composition in terms of alleles and genotypes is found to be greatly disproportionate. The results present clear insights into the data and its MAF, which can be compared with other datasets. These findings can aid researchers in gaining a comprehensive understanding of such case–control datasets and bring accurate insights into the data.
更多
查看译文
关键词
Single nucleotide polymorphism,Genome-wide exploratory analysis,Minor allele frequency,Variant annotation,Genome-wide association study
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要