Inferred spoligoforest topology unravels spatially bimodal distribution of mutations in the DR region.

IEEE transactions on nanobioscience(2012)

引用 3|浏览7
暂无评分
摘要
Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and mycobacterium interspersed repetitive unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes. We use a large patient dataset from the United States Centers for Disease Control and Prevention (CDC) to generate this model. Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Moreover, we find that the total number of mutation events at consecutive spacers follows a spatially bimodal distribution. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements, and the change point is spacer 34, which is absent in most MTBC strains. Based on this observation, we built two alternative models for mutation length frequency: the Starting Point Model (SPM) and the Longest Block Model (LBM). Both models are plausibly good fits to the mutation length frequency distribution, as verified by the goodness-of-fit test. We also apply SPM and LBM to a dataset from Institut Pasteur de Guadeloupe and verify that these models hold for different strain datasets.
更多
查看译文
关键词
contiguous deletion assumption,medical information systems,bimodal distribution,mutation length frequency,chromosomal rearrangement,inferred spoligoforest topology,diseases,miru-vntr,spoligotype rearrangement,spoligotype,tuberculosis,direct repeat locus,mutation model,patient dataset,starting point model,united states centers for disease control and prevention,within-lineage mutation event,power law distribution,spatially bimodal distribution,strain dataset,longest block model,medical computing,miru pattern,genetic distance,mycobacterium tuberculosis complex,dr region,direct repeat,convergent evolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要