The Case for Co-Designing Model Architectures with Hardware

Quentin Anthony, Jacob Hatef,Deepak Narayanan,Stella Biderman, Stas Bekman, Junqi Yin,Aamir Shafi,Hari Subramoni,Dhabaleswar Panda

CoRR（2024）

引用 0|浏览8

暂无评分

摘要

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要