A General and Efficient Training for Transformer via Token Expansion
CVPR 2024(2024)
摘要
The remarkable performance of Vision Transformers (ViTs) typically requires
an extremely large training cost. Existing methods have attempted to accelerate
the training of ViTs, yet typically disregard method universality with accuracy
dropping. Meanwhile, they break the training consistency of the original
transformers, including the consistency of hyper-parameters, architecture, and
strategy, which prevents them from being widely applied to different
Transformer networks. In this paper, we propose a novel token growth scheme
Token Expansion (termed ToE) to achieve consistent training acceleration for
ViTs. We introduce an "initialization-expansion-merging" pipeline to maintain
the integrity of the intermediate feature distribution of original
transformers, preventing the loss of crucial learnable information in the
training process. ToE can not only be seamlessly integrated into the training
and fine-tuning process of transformers (e.g., DeiT and LV-ViT), but also
effective for efficient training frameworks (e.g., EfficientTrain), without
twisting the original training hyper-parameters, architecture, and introducing
additional training strategies. Extensive experiments demonstrate that ToE
achieves about 1.3x faster for the training of ViTs in a lossless manner, or
even with performance gains over the full-token training baselines. Code is
available at https://github.com/Osilly/TokenExpansion .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要