Permutation analysis prior to variable selection greatly enhances robustness of OPLS analysis in small cohorts

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
The R-workflow roplspvs (R orthogonal projections of latent structures with permutation over variable selection) facilitates variable selection, model optimization and significance testing using permutations of OPLS-DA models, with the scaled loadings (p[corr]) as the main metric of significance cutoff. Permutations including (over) the variable selection procedure, prior to (sans), as well as post variable selection are performed. The resulting p-values for the correlation of the model (R2)and the cross-validated correlation of the model (Q2) sans-, post- and over- variable selection are provided as additional model statistics. These model statistics are useful for determining the true significance level of OPLS models, which otherwise have proven difficult to assess particularly for small sample sizes. Furthermore, we propose a means for estimating the background noise level based on permutated false positive rates of R2 and Q2. This novel metric is then utilized to calculate an adjusted Q2 value. Using a publicly available metabolomics dataset, the advantage of performing permutations over variable selection was demonstrated for small sample sizes. Iteratively reducing the sample sizes resulted in overinflated models with increasing R2 and Q2, and permutations post variable selection indicated falsely significant models. In contrast, the adjusted Q2 was marginally affected by sample size, and represents a robust estimate of model predictability, and permutations over variable selection showed true significance of the models. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要