TT-ViT: Vision Transformer Compression Using Tensor-Train Decomposition.

Hoang Pham Minh, Nguyen Nguyen Xuan,Tran Thai Son

International Conference on Computational Collective Intelligence (ICCCI)(2022)

引用 1|浏览6
暂无评分
摘要
Inspired by Transformer, one of the most successful deep learning models in natural language processing, machine translation, etc. Vision Transformer (ViT) has recently demonstrated its effectiveness in computer vision tasks such as image classification, object detection, etc. However, the major issue with ViT is to require massively trainable parameters. In this paper, we propose a novel compressed ViT model, namely Tensor-train ViT (TT-ViT), based on tensor-train (TT) decomposition. Consider a multi-head self-attention layer, instead of storing whole trainable matrices, we represent them in TT format via their TT cores using fewer parameters. The results of our experiments on CIFAR-10/Fashion-MNIST dataset reveal that TT-ViT achieves outstanding performance with equivalent accuracy to its baseline model, while total parameters of TT-ViT are just half of those of the baseline model.
更多
查看译文
关键词
Vision transformer,Tensor decomposition,Tensor-train decomposition,Model compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要