The Mega2R R package: tools for accessing and processing common genetic data formats in R

F1000Research(2017)

引用 0|浏览6
暂无评分
摘要
The standalone C++ Mega2 program, available from https://watson.hgen.pitt.edu/register/, has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs have been added over time, though this has required software developers keenly knowledgeable in the Mega2 internals. Currently, Mega2 converts data from several different genetic analysis data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3, etc.). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, much like that of the GenABEL R facility and the PLINK binary format. Concurrently, the R community and Bioconductor community have developed a variety of genetic analysis programs complimentary to the programs supported by Mega2. The Mega2R R package now makes it easy to load SQLite3 Mega2 databases directly into R as data frames to use these R facilities. In addition, we have developed C++ functions for R to decompress needed subsets of the genotype data, on the fly, in a memory efficient manner. We have also created several more functions that illustrate how to use the Mega2R data frames as well as perform useful tasks: these permit one to run the pedgene package to carry out gene-based association tests on family data using selected marker subsets, to run the SKAT package to carry out gene-based association tests using selected marker subsets, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL  R gwaa.data-class objects. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. Mega2R is available here: https://CRAN.R-project.org/package=Mega2R.  This work was supported by NIH grant R01 GM076667 (PI: Weeks).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要