ReLU^2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
CoRR(2024)
摘要
Sparse computation offers a compelling solution for the inference of Large
Language Models (LLMs) in low-resource scenarios by dynamically skipping the
computation of inactive neurons. While traditional approaches focus on
ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of
sparse LLMs beyond zero activation values. We introduce a general method that
defines neuron activation through neuron output magnitudes and a tailored
magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse
activation. To find the most efficient activation function for sparse
computation, we propose a systematic framework to examine the sparsity of LLMs
from three aspects: the trade-off between sparsity and performance, the
predictivity of sparsity, and the hardware affinity. We conduct thorough
experiments on LLMs utilizing different activation functions, including ReLU,
SwiGLU, ReGLU, and ReLU^2. The results indicate that models employing
ReLU^2 excel across all three evaluation aspects, highlighting its potential
as an efficient activation function for sparse LLMs. We will release the code
to facilitate future research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要