SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker

Applied Intelligence(2023)

引用 0|浏览3
暂无评分
摘要
Recently, event cameras, as a new generation of bionic cameras with characteristics of high dynamic range and high temporal resolution, provide a brand new competitive modal for multi-modal tracking. However, recent works on RGBE tracking pay too much attention to utilizing the complementary information while ignoring to enhance the modality-shared information and the global relations inside and across modalities. In this paper, we propose an end-to-end full attention tracker named Swin Transformer E vent F rame T racker (SwinEFT) to fully explore both modality-specific and modality-shared information. To be specific, we firstly adopt a simple but effective event representation to narrow the domain gap as well as obtain a clearer tracking target. With the deployment of shifted window based attention mechanism, our tracker is better able to leverage the global relations, resulting in locating a more accurate bounding box. Besides, in order to enhance the modality-shared information, we design Swin Decoder by introducing cross-attention based on shifted windows for information interaction. Extended experiments on two realistic RGBE tracking datasets demonstrate the outstanding performance and robustness of SwinEFT against the state-of-the-art methods under various challenging scenarios.
更多
查看译文
关键词
Visual object tracking,Event camera,Multi-modal fusion,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要