Intrinsic-Dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing

Jessica Gliozzo, Valentina Guarino, Arturo Bonometti,Alberto Cabri, Emanuale Cavalleri, Mauricio Soto Gomez,Justin Reese,Peter Nicholas Robinson,Marco Mesiti,Giorgio Valentini,Elena Casiraghi

crossref(2024)

引用 0|浏览9
暂无评分
摘要
The advent of high-throughput sequencing technologies has revolutionized the field of multi-omics patient data analysis. While these techniques offer a wealth of information, they often generate datasets with dimensions far surpassing the number of available cases. This discrepancy in size gives rise to the challenging ''small-sample-size'' problem, significantly compromising the reliability of any subsequent estimate, whether supervised or unsupervised. This calls for effective dimensionality reduction techniques to transform high-dimensional datasets into lower-dimensional spaces, making the data manageable and facilitating subsequent analyses. Unfortunately, the definition of a proper dimensionality reduction pipeline is not an easy task; besides the problem of identifying the best dimensionality reduction method, the definition of the dimension of the lower-dimensional space into which each dataset should be transformed is a crucial issue that influences all the subsequent analyses and should therefore be carefully considered. Further, the availability of multi-modal data calls for proper data-fusion techniques to produce an integrated patient-view into which redundant information is removed while salient and complementary information across views is leveraged to improve the performance and reliability of both unsupervised and supervised learning techniques. This paper proposes leveraging the intrinsic dimensionality of each view in a multi-modal dataset to define the dimensionality of the lower-dimensional space where the view is transformed by dimensionality reduction algorithms. Further, it presents a thorough experimental study that compares the traditional application of a unique-step of dimensionality reduction with a two-step approach, involving a prior feature selection followed by feature extraction. Through this comparative evaluation, we scrutinize the performance of widely used dimensionality reduction algorithms. Importantly, we also investigate their impact on unsupervised data-fusion techniques, which are pivotal in biomedical research. Our findings shed light on the most effective strategies for handling high-dimensional multi-omics patient data, offering valuable insights for future studies in this domain.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要