Demixer: A probabilistic generative model to delineate different strains of a microbial species in a mixed infection sample

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Multi-drug resistant or hetero-resistant Tuberculosis (TB) hinders the successful treatment of TB. Hetero-resistant TB occurs when multiple strains of the TB-causing bacterium with varying degrees of drug susceptibility are present in an individual. Existing studies predicting the proportion and identity of strains in a mixed infection sample rely on a reference database of known strains. A main challenge then is to identify de novo strains not present in the reference database, while quantifying the proportion of known strains. We present Demixer, a probabilistic generative model that uses a combination of reference-based and reference-free techniques to delineate mixed infection strains in whole genome sequencing (WGS) data. Demixer extends a topic model widely used in text mining to represent known mutations and discover novel ones. Parallelization and other heuristics enabled Demixer to process large datasets like CRyPTIC (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium). In both synthetic and experimental benchmark datasets, our proposed method precisely detected the identity (e.g., 91.67% accuracy on the experimental in vitro dataset) as well as the proportions of the mixed strains. In real-world applications, Demixer revealed novel high confidence mixed infections (101 out of 1,963 Malawi samples analyzed), and new insights into the global frequency of mixed infection (2% at the most stringent threshold in the CRyPTIC dataset) and its significant association to drug resistance. Our approach is generalizable and hence applicable to any bacterial and viral WGS data. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要