uniForest: an unsupervised machine learning technique to detect outliers and restrict variance in microbiome studies

biorxiv(2021)

引用 2|浏览0
暂无评分
摘要
Isolation Forests is an unsupervised machine learning technique for detecting outliers in continuous datasets that does not require an underlying equivariant or Gaussian distribution and is suitable for use on small datasets. While this procedure is widely used across quantitative fields, to our knowledge, this is the first attempt to solely assess its use for microbiome datasets. Here we present uniForest, an interactive Python notebook (which can be run from any desktop computer using the Google Colaboratory web service) for the processing of microbiome outliers. We used uniForest to apply Isolation Forests to the Healthy Human Microbiome project dataset and imputed outliers with the mean of the remaining inliers to maintain sample size and assessed its prowess in variance reduction in both community structure and derived ecological statistics (α-diversity). We also assessed its functionality in anatomical site differentiation (pre- and postprocessing) using principal component analysis, dissimilarity matrices, and ANOSIM. We observed a minimum variance reduction of 81.17% across the entire dataset and in alpha diversity at the Phylum level. Application of Isolation Forests also separated the dataset to an extremely high specificity, reducing variance within taxa samples by a minimum of 81.33%. It is evident that Isolation Forests are a potent tool in restricting the effect of variance in microbiome analysis and has potential for broad application in studies where high levels of microbiome variance is expected. This software allows for clean analyses of otherwise noisy datasets. ### Competing Interest Statement This research was performed using a Postdoctoral Fellowship provided to RL by Alltech. RM received a salary from Alltech at the time of these experiments.
更多
查看译文
关键词
microbiome,unsupervised machine learning technique,outliers,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要