Parallelized implementation of an explicit finite element method in many integrated core (MIC) architecture.

Advances in Engineering Software(2018)

引用 12|浏览16
暂无评分
摘要
A high performance nonlinear dynamic simulation system based on many integrated core architecture is developed.This parallel simulation system involves a vectorized thread-level parallelism BelytschkoTsay shell element.Fully vectorized fine-grained parallel computing method is presented to meet the natural parallelism of explicit FE algorithm.A heterogeneous calculation method is adopted to overlap data transfer and offloaded computation to achieving high efficiency in storing the intermediate results. Hardware accelerators are becoming increasingly important in boosting high performance computing systems. In this study, we develop a parallel explicit finite element (FE) analysis system based on a many integrated core (MIC) architecture for fast simulation of nonlinear dynamic problems of plate and shell structures. To minimize data transfer between heterogeneous architectures, parallel computation of the all explicit FE calculation is realized by developing a vectorized thread-level parallelism algorithm. The parallelism includes a novel dependency relationship link based method for efficiently solving parallel explicit shell element equations. A heterogeneous model is established to overlap data transfer and offloaded computation, and thus reduce the time required for large intermediate data storage in the actual engineering nonlinear problem simulation. Finally, a high performance nonlinear dynamic simulation system is developed. The simulations of benchmarks and engineering problems show that the parallel computing method proposed in this paper can give full play to the hardware performance of MIC architecture and effectively improve the computation efficiency of an explicit FE solution. For a bus body model containing approximately 3.8 million degrees of freedom, the computational speed is improved 17 times over CPU sequential computation, and the relative speedup grows with the increasing number of threads, the highest relative speedup exceeds 80.
更多
查看译文
关键词
Explicit finite element, Intel Xeon Phi coprocessor, MIC, Nonlinear analysis, Parallel computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要