Development of Variance Rank Initiated-Unsupervised Sample Indexing for Gas Chromatography-Mass Spectrometry Analysis
TALANTA(2021)
Univ Washington
Abstract
Traditional non-targeted chemometric workflows for gas chromatography-mass spectrometry (GC-MS) data rely on using supervised methods, which requires a priori knowledge of sample class membership. Herein, we propose a simple, unsupervised chemometric workflow known as variance rank initiated-unsupervised sample indexing (VRI-USI). VRI-USI discovers analyte peaks exhibiting high relative variance across all samples, followed by kmeans clustering on the individual peaks. Based upon how the samples cluster for a given peak, a sample index assignment is provided. Using a probabilistic argument, if the same sample index assignment appears for several discovered peaks, then this outcome strongly suggests that the samples are properly classified by that particular sample index assignment. Thus, relevant chemical differences between the samples have been discovered in an unsupervised fashion. The VRI-USI workflow is demonstrated on three, increasingly difficult datasets: simulations, yeast metabolomics, and human cancer metabolomics. For simulated GC-MS datasets, VRI-USI discovered 85-90% of analytes modeled to vary between sample classes. Nineteen out of 53 peaks in the peak table developed for the yeast metabolome dataset had the same sample index assignments, indicating that those indices are most likely due to class-distinguishing chemical differences. A t-test revealed that 22 out of 53 peaks were statistically significant (p < 0.05) when using those sample index assignments. Likewise, for the human cancer metabolomics study, VRI-USI discovered 25 analytes that were statistically different (p < 0.05) using the sample index assignments determined to highlight meaningful sample-based differences. For all datasets, the sample index assignments that were deduced from VRI-USI were the correct class-based difference when using prior knowledge. VRI-USI holds promise as an exploratory data analysis workflow for studies in which analysts do not readily have a priori class information or want to uncover the underlying nature of their dataset.
MoreTranslated text
Key words
Unsupervised Variance rank initiated-unsupervised sample indexing,Gas chromatography-mass spectrometry,Chemometrics,Exploratory data analysis
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined