Replacing softmax with ReLU in Vision Transformers

Mitchell Wortsman,Jaehoon Lee,Justin Gilmer,Simon Kornblith

CoRR（2023）

引用 0|浏览94

暂无评分

摘要

Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.

查看译文

关键词

softmax,relu,vision

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要