AMTrack:Transformer Tracking Via Action Information and Mix-Frequency Features

Sugang Ma, Chuang Chen,Licheng Zhang,Xiaobao Yang, Jinyu Zhang,Xiangmo Zhao

EXPERT SYSTEMS WITH APPLICATIONS（2025）

Xian Univ Posts & Telecommun

Cited 0|Views4

Abstract

Nowadays, Transformer-based visual tracking algorithms have been developing quickly because of the self- attention mechanism of Transformer, which has the capability to model global information. Although the self-attention mechanism in Transformer can effectively capture long-range dependencies in feature space, they only use flattened two-dimensional features and are unable to capture long-range temporal dependencies. Furthermore, since the self-attention in Transformer functions as a low-pass filter, it picks up on low-frequency features of the target while ignoring high-frequency features. This research suggests a Transformer tracker based on action information and mix-frequency features (AMTrack) to address these problems. Specifically, to address the lack of temporal remote dependencies, we introduce the target action aware module and the target action offset module. The target action aware module sets up several pathways to extract spatiotemporal, channel, and motion feature independently. In contrast, the target action offset module computes the target's offset information by computing relative feature maps. Furthermore, in order to address the imbalance between high and low frequency features, we propose a mix-frequency attention and multi-frequency self- attention convolutional block. The mix-frequency attention uses high-frequency features within partitioned local windows as input for the high-frequency branch and average-pooled low-frequency features as the input for the low-frequency branch, calculating attention scores in both branches respectively. The multi-frequency self-attention convolutional block uses self-attention to capture low-frequency features and convolution to capture high-frequency features. Extensive experiments are carried out on eight challenging tracking datasets (e.g., OTB100 (Object Tracking Benchmark 100), NFS (Need For Speed), UAV123 (Unmanned Aerial Vehicles 123), TC128 (Temple Color 128), VOT2018 (Visual Object Tracking 2018), LaSOT (Large-scale Single Object Tracking), TrackingNet (Tracking Network), GOT-10k (Generic Object Tracking-10k)), and the experimental results show that our tracker achieves excellent tracking performance when compared with several state-ofthe-art tracking algorithms. The experimental results show that on LaSOT, the success rates AUC (Area Under Curve), P Norm , and P reach 65.8%, 69.2%, and 68.0%, respectively, where the AUC value is 2.1% higher than the baseline algorithm TrDiMP (Transformer Discriminative Model Prediction). On other datasets, our tracker also achieves excellent tracking performance.

Translated text

Key words

Visual object tracking,Siamese network,Transformer,Target action aware,Mix-frequency features

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined