CEAFFOD: Cross-Ensemble Attention-based Feature Fusion Architecture Towards a Robust and Real-time UAV-based Object Detection in Complex Scenarios

Ahmed Elhagryl,Hang Dai,Abdulmotaleb El Saddik,Wail Gueaieb,Giulia De Masi

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA（2023）

引用 1|浏览7

暂无评分

摘要

Deploying object detectors in embedded devices such as unmanned aerial vehicles (UAVs) comes with many challenges. This is due to both the UAV itself having low embedded resources in terms of computation and memory, and also due to the nature of the captured visual data with the variations in objects' scale, orientation, density, viewpoint, distribution, shape, context and others. It is crucial for the object detector to be robust with high accuracy, real-time with fast inference and light-weight to be applicable. Inspired by YOLO architecture, we propose a novel single-stage detection architecture. Our contributions are, first, feature fusion spatial pyramid pooling (FFSPP) block that applies attention-based feature fusion across both time and space utilizing the information of subsequent frames and scales in an efficient manner. Secondly, we introduce a multi-dilated attention-based cross-stage partial connection (MDACSP) block that helps in increasing the receptive field and producing per-channel modulation weights after aggregating the feature maps across their spatial domain. Third, scaled feature fusion head (SFFH) fuses both the FFSPP block features and the connected MDACSP block features specific for this head. For a more robust result across different scenarios, we perform cross-ensembling with three of the top UAV/traffic surveillance datasets: UAVDT, UA-DETRAC and VisDrone. Our ablation study shows how every contribution improves over the baseline. Our approach yielded the state-of-the-art results in all the aforementioned datasets achieving 89.3% mAP, 93.5% mAP, and 42.9% mAP respectively. Testing the model performance on NVIDIA Jetson Xavier NX board shows a desirable balance between the inference time and the memory cost. We also show qualitatively the model robustness and efficiency across the diverse complex scenarios of these datasets. We hope this work facilitates the advancement of the UAV-based perception in such crucial industrial applications.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要