Complex Scene Segmentation With Local to Global Self-Attention Module and Feature Alignment Module.

IEEE Access(2023)

引用 0|浏览4
暂无评分
摘要
It is challenging to accurately mode the local and global context during complex scene segmentation. To solve this problem, a scene semantic segmentation network contains local to global self-attention module and feature alignment module is proposed in this paper. The local to global self-attention module is designed to combine the local and global features, in which the transformer backbone treats all patches equally in the global scope, to extract high-level features. The improved masked transformer with feature alignment module (MtFAM), which combines the masked transformer and feature alignment module to form a new decoder structure, is designed to fuse the features obtained from the vision transformer backbone and the local to global self-attention module. Experimental results demonstrate that the proposed structure show better performance, which can improve the value of mIoU by 3.63% on the ADE20K validation dataset compared to the Vit-Tiny. In particular, it can obtain 2.23% higher mIoU value than the segmenter method using the same transformer backbone on the challenging scene segmentation benchmark.
更多
查看译文
关键词
complex scene segmentation,feature alignment module,self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要