Faster Inference of Integer SWIN Transformer by Removing the GELU Activation
CoRR(2024)
摘要
SWIN transformer is a prominent vision transformer model that has
state-of-the-art accuracy in image classification tasks. Despite this success,
its unique architecture causes slower inference compared with similar deep
neural networks. Integer quantization of the model is one of the methods used
to improve its inference latency. However, state-of-the-art has not been able
to fully quantize the model. In this work, we improve upon the inference
latency of the state-of-the-art methods by removing the floating-point
operations, which are associated with the GELU activation in Swin Transformer.
While previous work proposed to replace the non-integer operations with linear
approximation functions, we propose to replace GELU with ReLU activation. The
advantage of ReLU over previous methods is its low memory and computation
complexity. We use iterative knowledge distillation to compensate for the lost
accuracy due to replacing GELU with ReLU. We quantize our GELU-less SWIN
transformer and show that on an RTX 4090 NVIDIA GPU we can improve the
inference latency of the quantized SWIN transformer by at least 11% while
maintaining an accuracy drop of under 0.5% on the ImageNet evaluation
dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要