Scaling up ridge regression for brain encoding in a massive individual fMRI dataset
arxiv(2024)
摘要
Brain encoding with neuroimaging data is an established analysis aimed at
predicting human brain activity directly from complex stimuli features such as
movie frames. Typically, these features are the latent space representation
from an artificial neural network, and the stimuli are image, audio, or text
inputs. Ridge regression is a popular prediction model for brain encoding due
to its good out-of-sample generalization performance. However, training a ridge
regression model can be highly time-consuming when dealing with large-scale
deep functional magnetic resonance imaging (fMRI) datasets that include many
space-time samples of brain activity. This paper evaluates different
parallelization techniques to reduce the training time of brain encoding with
ridge regression on the CNeuroMod Friends dataset, one of the largest deep fMRI
resource currently available. With multi-threading, our results show that the
Intel Math Kernel Library (MKL) significantly outperforms the OpenBLAS library,
being 1.9 times faster using 32 threads on a single machine. We then evaluated
the Dask multi-CPU implementation of ridge regression readily available in
scikit-learn (MultiOutput), and we proposed a new "batch" version of Dask
parallelization, motivated by a time complexity analysis. In line with our
theoretical analysis, MultiOutput parallelization was found to be impractical,
i.e., slower than multi-threading on a single machine. In contrast, the
Batch-MultiOutput regression scaled well across compute nodes and threads,
providing speed-ups of up to 33 times with 8 compute nodes and 32 threads
compared to a single-threaded scikit-learn execution. Batch parallelization
using Dask thus emerges as a scalable approach for brain encoding with ridge
regression on high-performance computing systems using scikit-learn and large
fMRI datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要