Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021(2021)

引用 2783|浏览44860
暂无评分
摘要
Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE2OK test server leaderboard on the day of submission.
更多
查看译文
关键词
sequence-to-sequence perspective,semantic segmentation methods,fully-convolutional network,encoder-decoder architecture,spatial resolution,receptive fields,context modeling,receptive field,inserting attention modules,encoder-decoder based FCN architecture,alternative perspective,sequence-to-sequence prediction task,pure transformer,convolution,resolution reduction,simple decoder,powerful segmentation model,termed segmentation transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要