Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

ECCV, 2016.

Cited by: 914|Bibtex|Views64|Links
EI
Keywords:
convolutional neural networksspatial robustnessGood Features to Trackcontinuous convolutionfeature point trackingMore(16+)
Weibo:
Our formulation enables the integration of multi-resolution feature maps

Abstract:

Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps,...More

Code:

Data:

0
Introduction
  • Visual tracking is the task of estimating the trajectory of a target in a video.
  • It is one of the fundamental problems in computer vision.
  • Discriminative Correlation Filter (DCF) based approaches have shown outstanding results on object tracking benchmarks [30,46].
  • DCF methods train a correlation filter for the task of predicting the target classification scores.
  • The DCF efficiently utilize all spatial shifts of the training samples by exploiting the discrete Fourier transform
Highlights
  • Visual tracking is the task of estimating the trajectory of a target in a video
  • On the challenging OTB-2015 with 100 videos, our object tracking framework improves the state-of-the-art from 77.3% to 82.4% in mean overlap precision
  • We show that the advantages of our continuous formulation are crucial for accurate feature point tracking
  • We propose a generic framework for learning discriminative convolution operators in the continuous spatial domain
  • Our formulation enables the integration of multi-resolution feature maps
  • Experiments on three object tracking benchmarks demonstrate that our approach achieves superior performance compared to the state-of-the-art
Methods
  • The authors validate the learning framework for two applications: tracking of objects and feature points.
  • The authors perform comprehensive experiments on three datasets: OTB-2015 [46], Temple-Color [32], and VOT2015 [29].
  • The authors perform extensive experiments on the MPI Sintel dataset [7].
  • Table 1 shows the tracking results, in mean overlap precision (OP) and area-under-the-curve (AUC), on the OTB2015 dataset.
  • For details about the OTB protocol, the authors refer to [45]
Results
  • On the challenging OTB-2015 with 100 videos, the object tracking framework improves the state-of-the-art from 77.3% to 82.4% in mean overlap precision.
  • Compared to MOSSE, the method obtains significantly improved precision at sub-pixel thresholds (< 1 pixel).
  • The authors' method obtains substantially improved accuracy and robustness for real-time feature point tracking
Conclusion
  • The authors propose a generic framework for learning discriminative convolution operators in the continuous spatial domain.
  • The authors validate the framework for two problems: object tracking and feature point tracking.
  • The authors' formulation enables the integration of multi-resolution feature maps.
  • The authors' approach is capable of accurate sub-pixel localization.
  • Experiments on three object tracking benchmarks demonstrate that the approach achieves superior performance compared to the state-of-the-art.
  • The authors' method obtains substantially improved accuracy and robustness for real-time feature point tracking
Summary
  • Introduction:

    Visual tracking is the task of estimating the trajectory of a target in a video.
  • It is one of the fundamental problems in computer vision.
  • Discriminative Correlation Filter (DCF) based approaches have shown outstanding results on object tracking benchmarks [30,46].
  • DCF methods train a correlation filter for the task of predicting the target classification scores.
  • The DCF efficiently utilize all spatial shifts of the training samples by exploiting the discrete Fourier transform
  • Methods:

    The authors validate the learning framework for two applications: tracking of objects and feature points.
  • The authors perform comprehensive experiments on three datasets: OTB-2015 [46], Temple-Color [32], and VOT2015 [29].
  • The authors perform extensive experiments on the MPI Sintel dataset [7].
  • Table 1 shows the tracking results, in mean overlap precision (OP) and area-under-the-curve (AUC), on the OTB2015 dataset.
  • For details about the OTB protocol, the authors refer to [45]
  • Results:

    On the challenging OTB-2015 with 100 videos, the object tracking framework improves the state-of-the-art from 77.3% to 82.4% in mean overlap precision.
  • Compared to MOSSE, the method obtains significantly improved precision at sub-pixel thresholds (< 1 pixel).
  • The authors' method obtains substantially improved accuracy and robustness for real-time feature point tracking
  • Conclusion:

    The authors propose a generic framework for learning discriminative convolution operators in the continuous spatial domain.
  • The authors validate the framework for two problems: object tracking and feature point tracking.
  • The authors' formulation enables the integration of multi-resolution feature maps.
  • The authors' approach is capable of accurate sub-pixel localization.
  • Experiments on three object tracking benchmarks demonstrate that the approach achieves superior performance compared to the state-of-the-art.
  • The authors' method obtains substantially improved accuracy and robustness for real-time feature point tracking
Tables
  • Table1: A baseline comparison when using different combinations of convolutional layers in our object tracking framework. We report the mean OP (%) and AUC (%) on the OTB-2015 dataset. The best results are obtained when combining all three layers in our framework. The results clearly show the importance of multi-resolution deep feature maps for improved object tracking performance
  • Table2: A Comparison with state-of-the-art methods on the OTB-2015 and TempleColor datasets. We report the mean OP (%) for the top 10 methods on each dataset. Our approach outperforms DeepSRDCF by 5.1% and 5.0% respectively
  • Table3: Comparison with state-of-the-art methods on the VOT2015 dataset. The results are presented in terms of robustness and accuracy. Our approach provides improved robustness with a significant reduction in failure rate
Download tables as Excel
Related work
  • Discriminative Correlation Filters (DCF) [5,11,24] have shown promising results for object tracking. These methods exploit the properties of circular correlation for training a regressor in a sliding-window fashion. Initially, the DCF approaches [5,23] were restricted to a single feature channel. The DCF framework was later extended to multi-channel feature maps [4,13,17]. The multi-channel DCF allows high-dimensional features, such as HOG and Color Names, to be incorporated for improved tracking. In addition to the incorporation of multi-channel features, the DCF framework has been significantly improved lately by, e.g., including scale estimation [9,31], non-linear kernels [23,24], a long-term memory [36], and by alleviating the periodic effects of circular convolution [11,15,18].
Funding
  • Acknowledgments: This work has been supported by SSF (CUAS), VR (EMC2), CENTAURO, the Wallenberg Autonomous Systems Program, NSC and Nvidia
Reference
  • Badino, H., Yamamoto, A., Kanade, T.: Visual odometry by multi-frame feature integration. In: ICCV Workshop (2013) 3
    Google ScholarLocate open access versionFindings
  • Baker, S., Matthews, I.A.: Lucas-kanade 20 years on: A unifying framework. IJCV 56(3), 221–255 (2004) 3
    Google ScholarLocate open access versionFindings
  • Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: CVPR (2016) 11
    Google ScholarFindings
  • Boddeti, V.N., Kanade, T., Kumar, B.V.K.V.: Correlation filters for object alignment. In: CVPR (2013) 3
    Google ScholarFindings
  • Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: CVPR (2010) 3, 9, 13
    Google ScholarFindings
  • Bouguet, J.Y.: Pyramidal implementation of the lucas kanade feature tracker. Tech. rep., Microprocessor Research Labs, Intel Corporation (2000) 13
    Google ScholarFindings
  • Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: ECCV (2012) 3, 10, 13
    Google ScholarLocate open access versionFindings
  • Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: CVPR (2015) 3
    Google ScholarFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: BMVC (2014) 3, 11
    Google ScholarFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: ICCV Workshop (2015) 1, 3, 11
    Google ScholarLocate open access versionFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV (2015) 3, 4, 5, 6, 9, 11
    Google ScholarFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In: CVPR (2016) 11, 12
    Google ScholarFindings
  • Danelljan, M., Shahbaz Khan, F., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: CVPR (2014) 3, 9, 11
    Google ScholarFindings
  • Felsberg, M.: Enhanced distribution field tracking using channel representations. In: ICCV Workshop (2013) 11
    Google ScholarLocate open access versionFindings
  • Fernandez, J.A., Boddeti, V.N., Rodriguez, A., Kumar, B.V.K.V.: Zero-aliasing correlation filters for object recognition. TPAMI 37(8), 1702–1715 (2015) 3
    Google ScholarLocate open access versionFindings
  • Fusiello, A., Trucco, E., Tommasini, T., Roberto, V.: Improving feature tracking with robust statistics. Pattern Anal. Appl. 2(4), 312–320 (1999) 3
    Google ScholarLocate open access versionFindings
  • Galoogahi, H.K., Sim, T., Lucey, S.: Multi-channel correlation filters. In: ICCV (2013) 3, 4
    Google ScholarFindings
  • Galoogahi, H.K., Sim, T., Lucey, S.: Correlation filters with limited boundaries. In: CVPR (2015) 3, 11
    Google ScholarFindings
  • Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with gaussian process regression. In: ECCV (2014) 11
    Google ScholarFindings
  • Gladh, S., Danelljan, M., Shahbaz Khan, F., Felsberg, M.: Deep motion features for visual tracking. In: ICPR (2016) 14
    Google ScholarFindings
  • Hare, S., Saffari, A., Torr, P.: Struck: Structured output tracking with kernels. In: ICCV (2011) 11
    Google ScholarFindings
  • He, S., Yang, Q., Lau, R., Wang, J., Yang, M.H.: Visual tracking via locality sensitive histograms. In: CVPR (2013) 11
    Google ScholarFindings
  • Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: ECCV (2012) 3
    Google ScholarFindings
  • Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37(3), 583–596 (2015) 3, 4, 9, 11
    Google ScholarLocate open access versionFindings
  • Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR (2012) 11
    Google ScholarFindings
  • Kalal, Z., Matas, J., Mikolajczyk, K.: P-n learning: Bootstrapping binary classifiers by structural constraints. In: CVPR (2010) 11
    Google ScholarFindings
  • Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: ISMAR (2007) 3
    Google ScholarFindings
  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin, L., Vojır, T., Hager, G., Lukezic, A., Fernandez, G.: The visual object tracking vot2016 challenge results. In: ECCV workshop (2016) 13
    Google ScholarLocate open access versionFindings
  • Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., Vojır, T., Nebehay, G., Pflugfelder, R., Hager, G.: The visual object tracking vot2015 challenge results. In: ICCV workshop (2015) 2, 10, 12
    Google ScholarLocate open access versionFindings
  • Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., et al.: The visual object tracking VOT 2014 challenge results. In: ECCV Workshop (2014) 1
    Google ScholarLocate open access versionFindings
  • Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: ECCV Workshop (2014) 3, 9, 11
    Google ScholarLocate open access versionFindings
  • Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: Algorithms and benchmark. TIP 24(12), 5630–5644 (2015) 2, 10, 12
    Google ScholarLocate open access versionFindings
  • Liu, L., Shen, C., van den Hengel, A.: The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification. In: CVPR (2015) 3
    Google ScholarFindings
  • Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981) 3, 13
    Google ScholarFindings
  • Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015) 1, 3, 11
    Google ScholarFindings
  • Ma, C., Yang, X., Zhang, C., Yang, M.H.: Long-term correlation tracking. In: CVPR (2015) 3, 11
    Google ScholarFindings
  • Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, 2nd edn. (2006) 9 38. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR (2014) 3 39.
    Google ScholarLocate open access versionFindings
  • Ovren, H., Forssen, P.: Gyroscope-based video stabilisation with auto-calibration. In: ICRA (2015) 3 40.
    Google ScholarFindings
  • Possegger, H., Mauthner, T., Bischof, H.: In defense of color-based model-free tracking. In: CVPR (2015) 11 41.
    Google ScholarFindings
  • Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994) 13 43.
    Google ScholarFindings
  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) 3 44.
    Google ScholarFindings
  • (1991) 3, 13 45.
    Google ScholarLocate open access versionFindings
  • Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. TPAMI 37(9), 1834–
    Google ScholarFindings
  • 1848 (2015) 1, 2, 10, 12 47.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments