Gsdcreator: An Efficient And Comprehensive Simulator For Genarating Ngs Data With Population Genetic Information
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)(2019)
摘要
In recent decades, NGS data analysis has become a major research field in bioinformatics, which presents great advantages in many application scenarios. Many algorithms and software were designed for analyzing the NGS data, while simulation datasets are urgently needed for testing software and optimizing their parameter configurations. Thus, a series of NGS data simulators have been published. However, the existing simulators cannot satisfy the requirements from many specific scenarios. First, they do not support many newly discovered variations. Second, complex structural variations are difficult to generate. In addition, along with the increase of population data, it is urgent to increase population information simulation. In this paper, we propose GSDcreator, a comprehensive NGS simulator that overcome the three weaknesses mentioned above. It can produce all known types of variation, where the complex of variations are also supported. Furthermore, it can capture many important real data features including population polymorphism, insert size distribution, adjacent site depth distribution, overall depth distribution, quality score distribution, amplification bias, sequencing errors and so on. It's highlighted that 1000 Genomes Project Database is taken as a reference and integrates population genetic information to simulate population polymorphism. To test the performance, we did a lot of experiments and found that simulated data produced by GSDcreator are quit mimic to the real sequencing data.
更多查看译文
关键词
population genomics, next-generation sequencing data analysis, data simulator, population information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络