Small Object Detector Using Contextual Local Features and Global Representations for Autonomous Driving.

2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)(2023)

引用 0|浏览0
暂无评分
摘要
We present a hybrid architecture called Contextual-Enhanced Transformer Network (CEFormer) that leverages both Convolutional Neural Networks (CNN) and transformer-style networks for computer vision tasks. CNNs are good at modeling local features due to their local nature and weight sharing, while transformers are good at capturing global contextual features due to their self-attention mechanism. Our CEFormer uses a parallel network structure that combines the strengths of both CNNs and transformers for image feature representation. Specifically, we design an enhanced multi-headed attention module and contextual attention module that extracts and enhances globle features and contextual features on two branches for the task of small target detection in an autonomous driving environment. Moreover, we propose a lightweight cross-branch fusion module that reduces the parameters and computational complexity of the feature interaction. Our CEFormer achieves competitive results in target detection with Mask R-CNN and outperforms ResNet and transformer-based models. It also shows significant improvement over other methods on MS COCO, TT100K, and ImageNet datasets.
更多
查看译文
关键词
Local Features,Object Detection,Autonomous Vehicles,Small Objects,Global Representation,Convolutional Network,Convolutional Neural Network,Image Features,Feature Representation,Attention Mechanism,Global Features,Attention Module,ImageNet Dataset,Parallel Network,Mask R-CNN,COCO Dataset,MS COCO Dataset,Contextual Attention,Local Information,Parallelization,Global Information,Small Object Detection,NLP Tasks,Convolution Operation,Local Feature Extraction,Local Module,Transformer Block,Backbone Network,Global Attention,Attention Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要