Dual-resolution transformer combined with multi-layer separable convolution fusion network for real-time semantic segmentation

COMPUTERS & GRAPHICS-UK(2024)

引用 0|浏览5
暂无评分
摘要
Environmental perception is crucial for unmanned mobile platforms such as autonomous vehicles and robots. Precise and fast semantic segmentation of the surrounding scene is a key task to enhance this capability. Existing real-time semantic segmentation networks are typically based on convolutional neural networks (CNNs), which have achieved good results, but they still lack control over global context features. In recent years, the Transformer architecture has achieved significant success in capturing global context, which is beneficial for improving segmentation accuracy. However, Transformers tend to ignore local connections, and their computational complexity makes real-time segmentation challenging. We propose a lightweight real-time semantic segmentation network called DTMC-Net, which combines the advantages of CNNs and Transformers. We design a special residual convolution module called the Lightweight Multi -layer Separable Convolution Attention module (LMSCA) to reduce the parameter count and perform multi -scale feature fusion to capture local features effectively. We introduce the Simple Dual -Resolution Transformer (SDR Transformer) that utilizes lightweight attention mechanisms and residual feed forward networks to capture and maintain features, with multiple bilateral fusions between two branches to exchange information. The proposed Antiartifact Aggregation Pyramid Pooling Module (AAPPM) optimizes the upsampling process, refines features, and performs multi -scale feature fusion again. DTMC-Net only contains 4.2M parameters and achieves good performance on multiple public datasets with different scenarios.
更多
查看译文
关键词
Lightweight neural network,Transformer,Multi-scale feature fusion,Real-time semantic segmentation,Environmental perception
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要