Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search
arxiv(2024)
摘要
Image segmentation is one of the most fundamental problems in computer vision
and has drawn a lot of attentions due to its vast applications in image
understanding and autonomous driving. However, designing effective and
efficient segmentation neural architectures is a labor-intensive process that
may require lots of trials by human experts. In this paper, we address the
challenge of integrating multi-head self-attention into high resolution
representation CNNs efficiently, by leveraging architecture search. Manually
replacing convolution layers with multi-head self-attention is non-trivial due
to the costly overhead in memory to maintain high resolution. By contrast, we
develop a multi-target multi-branch supernet method, which not only fully
utilizes the advantages of high-resolution features, but also finds the proper
location for placing multi-head self-attention module. Our search algorithm is
optimized towards multiple objective s (e.g., latency and mIoU) and capable of
finding architectures on Pareto frontier with arbitrary number of branches in a
single search. We further present a series of model via Hybrid
Convolutional-Transformer Architecture Search (HyCTAS) method that searched for
the best hybrid combination of light-weight convolution layers and
memory-efficient self-attention layers between branches from different
resolutions and fuse to high resolution for both efficiency and effectiveness.
Extensive experiments demonstrate that HyCTAS outperforms previous methods on
semantic segmentation task. Code and models are available at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要