AB_SA: Tracing the source of bacterial strains based on accessory genes. Application to Salmonella Typhimurium environmental strains

biorxiv(2019)

引用 0|浏览11
暂无评分
摘要
The partitioning of pathogenic strains isolated in environmental or human cases to their original source is challenging. The pathogens usually colonize multiple animal hosts, including livestock, which contaminate food-producing and environment (e.g. soil and water), posing additional public health burden and major challenges in the identification of the source. Genomic data opens new opportunities for the development of statistical models aiming to infer the likely source of pathogen contamination. Here, we propose a computationally fast and efficient multinomial logistic regression (MLR) source attribution classifier to predict the animal source of bacterial isolates based on “source-enriched” loci extracted from the accessory-genome profiles of a pangenomic dataset. Depending on the accuracy of the model’s self-attribution step, the modeler selects the number of candidate accessory genes that better fit the model for calculating the likelihood of (source) category membership. The accessory genes-based source attribution (AB_SA) method was applied on a dataset of strains of Typhimurium and its monophasic variants (. 1,4,[5],12:i:-). The model was trained on 69 strains with known animal source categories (i.e., poultry, ruminant, and pig). The AB_SA method helped to identify eight genes as predictors among the 2,802 accessory genes. The self-attribution accuracy was 80%. The AB_SA model was then able to classify 25 over 29 Typhimurium and . 1,4,[5],12:i:-isolates collected from the environment (considered as unknown source) into a specific category (i.e., animal source), with more than 85% of probability. The AB_SA method herein described provides a user-friendly and valuable tool to perform source attribution studies in few steps. AB_SA is written in R and freely available at .
更多
查看译文
关键词
Source attribution,<italic>Salmonella</italic> Typhimurium,environmental contamination,pangenome-wide enrichment analysis,multinomial logistic regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要