Enhancing the Performance of AlphaFold Through Modified Storage Method and Optimization of HHblits on TSUBAME3.0 Supercomputer

2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)(2023)

引用 0|浏览3
暂无评分
摘要
Knowledge of the three-dimensional structures of proteins is vital for understanding their functions. This knowledge can also serve as a basis for understanding the various functions of the human body. The three-dimensional structures of proteins have been experimentally ascertained from the known amino acid sequence of proteins. To reduce the time and cost of experiments, computer-based methods such as AlphaFold have been proposed. AlphaFold uses existing tools such as HHblits to obtain multiple sequence alignments (MSAs) from huge G protein (sequence) databases, such as BFD. However, HHblits requires a long execution time, as it has to perform many I/O operations. The execution time of HHblits differs significantly depending on the type of storage used to store the protein (sequence) database. Notably, the execution time of HHblits when using TSUBAME3.0 Lustre storage area with default settings differs significantly from that when storage with stripe settings is used. Therefore, we modified the storage method of the protein (sequence) database that can be selected on TSUBAME3.0 and measured the execution time of HHblits based on the storage method selected. Further-more, we profiled the various bottlenecks of HHblits, and based on the results, tuned the number of parallel processes, modified the database arrangement, and optimized sorting. Furthermore, we made the tool execution asynchronous due to the data dependency between the MSA acquisition tools of AlphaFold. Consequently, we succeeded in shortening the execution time by half, on average, when predicting a three-dimensional structure from a single amino acid sequence on TSUBAME3.0. The modifications proposed in this study to accelerate the process have already been pull requested at the following URLs: HHblits (https://github.com/soedinglab/hh-suite/pull/307) and AlphaFold (https://github.com/deepmind/alphafold/pull/399).
更多
查看译文
关键词
Protein tertiary structure prediction,Drug Discovery,Optimization,GPGPU,Supercomputer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要