FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA AMD Matrix Accelerators
arxiv(2024)
摘要
NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix
Accelerators) are of growing interest in high-performance computing and machine
learning owing to their high performance. Unfortunately, their numerical
behaviors are not publicly documented, including the number of extra precision
bits maintained, the accumulation order of addition, and predictable subnormal
number handling during computations. This makes it impossible to reliably port
codes across these differing accelerators. This paper contributes a collection
of Feature Targeted Tests for Numerical Properties that that help
determine these features across five floating-point formats, four rounding
modes and additional that highlight the rounding behaviors and preservation of
extra precision bits. To show the practical relevance of FTTN, we design a
simple matrix-multiplication test designed with insights gathered from our
feature-tests. We executed this very simple test on five platforms, producing
different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875,
and Hopper H100 produced 191.875. Our matrix multiplication tests employ
patterns found in iterative refinement-based algorithms, highlighting the need
to check for significant result variability when porting code across GPUs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要