The Case for Co-Designing Model Architectures with Hardware
CoRR(2024)
摘要
While GPUs are responsible for training the vast majority of state-of-the-art
deep learning models, the implications of their architecture are often
overlooked when designing new deep learning (DL) models. As a consequence,
modifying a DL model to be more amenable to the target hardware can
significantly improve the runtime performance of DL training and inference. In
this paper, we provide a set of guidelines for users to maximize the runtime
performance of their transformer models. These guidelines have been created by
carefully considering the impact of various model hyperparameters controlling
model shape on the efficiency of the underlying computation kernels executed on
the GPU. We find the throughput of models with efficient model shapes is up to
39% higher while preserving accuracy compared to models with a similar number
of parameters but with unoptimized shapes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要