Considerations on the Implementation and Use of Anderson Acceleration on Distributed Memory and GPU-based Parallel Computers

ADVANCES IN THE MATHEMATICAL SCIENCES（2016）

引用 9|浏览0

暂无评分

摘要

Recent work suggests that Anderson acceleration can be used as an accelerator to the fixed-point iterative method. To improve the viability of the algorithm, we seek to improve its computational efficiency on parallel machines. The primary kernel of the method is a least-squares minimization within the main loop. We consider two approaches to reduce its cost. The first is to use a communication-avoiding QR factorization, and the second is to employ a GMRES-like restarting procedure. On problems using 1,000 processors or less, we find the amount of communication too low to justify communication avoidance. The restarting procedure also proves not to be better than current approaches unless the cost of the function evaluation is very small. In order to begin taking advantage of current trends in machine architecture, we also studied a first-attempt single-node GPU implementation of Anderson acceleration. Performance results show that for sufficiently large problems a GPU implementation can provide a significant performance increase over CPU versions due to the GPU's higher memory bandwidth.

查看译文

关键词

Anderson acceleration,Nonlinear solvers,Fixed-point iteration,TSQR

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要