EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction.

Han Cai, Junyan Li, Muyan Hu,Chuang Gan,Song Han

ICCV(2023)

引用 3|浏览52
暂无评分
摘要
High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel lightweight multi-scale attention. Unlike prior high-resolution dense prediction models that rely on heavy self-attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our lightweight multi-scale attention achieves a global receptive field and multi-scale learning (two critical features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art high-resolution dense prediction models with significant speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU. Without performance loss on Cityscapes, our EfficientViT provides up to 8.8× and 3.8× GPU latency reduction over SegFormer and SegNeXt, respectively. For super-resolution, EfficientViT provides up to 6.4× speedup over Restormer while providing 0.11dB gain in PSNR.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要