BLB-gcForest: A High-Performance Distributed Deep Forest With Adaptive Sub-Forest Splitting

IEEE Transactions on Parallel and Distributed Systems(2022)

引用 7|浏览36
暂无评分
摘要
As an emulous alternative to deep neural networks, Deep Forest emerges with features like low complexity, fewer hyper-parameters, and good robustness, which are predominantly desired in distributed computing applications and ecosystems. Recently, an efficient distributed Deep Forest system, named ForestLayer, was proposed, designing a fine-grained sub-Forest-based task-parallel algorithm to improve the parallel computing efficiency of Deep Forest. However, the sub-Forest splitting of ForestLayer is static and one-off without adaptability to the computing environment, nevertheless, the size of splitting granularity has a significant impact on the system performance. To further improve the computing efficiency and scalability of the distributed Deep Forest, in this paper, we propose a novel distributed Deep Forest algorithm, named BLB-gcForest (Bag of Little Bootstraps-gcForest), which augments the gcForest (multi-Grained Cascade Forest) approach for constructing Deep Forest. BLB-gcForest carries out parallel computation for each tree in sub-Forests at a finer parallel granularity and integrates with the Bag of Little Bootstraps (BLB) mechanism to reduce massive transmitted feature instances for Cascade Forest Layers, utterly improving both computation efficiency and communication efficiency. Moreover, to solve the problem of the forest splitting granularity, we further design an adaptive sub-Forest splitting algorithm to ensure the maximum resource utilization for parallel computation of each sub-Forest. Experimental results on four well-known large-scale datasets, namely YEAST, LETTER, MNIST, CIFAR10, show that the training efficiency of BLB-gcForest achieves up to 20.3x and 1.64x speedups compared with the state-of-the-art gcForest and ForestLayer, respectively while guaranteeing higher accuracy and better robustness
更多
查看译文
关键词
Deep forest,distributed computing,big data bootstrap,distributed AI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要