Hedged Deep Tracking

CVPR, pp. 4303-4311, 2016.

Cited by: 612|Views38
EI
Weibo:
We propose a novel convolutional neural network based tracking framework which uses an adaptive online decision learning algorithm to hedge weak trackers, obtained by correlation filters on convolutional neural network feature maps, into a stronger one to achieve better results

Abstract:

In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from on...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Visual tracking has become a topic of increasing interest over the past couple of decades due to its importance in numerous applications, such as intelligent video surveillance, vehicle navigation, and human-computer interaction.
  • Empirical studies using a large object tracking benchmark show that the performance of CNN based trackers surpasses that of hand-crafted features such as HOG [6], SIFT [24], and color histogram [28, 1].
  • Despite achieving state-of-the-art performance, existing CNN based trackers still have some limitations.
  • Most of these methods represent target objects only using features from very last layers of CNNs, which capture rich category-level semantic information, and are useful for object classification.
  • By factoring in historical performance of experts to make decisions, the authors propose an improved Hedge algorithm to update the weights of all experts, which is more suitable for real-world tracking tasks
Highlights
  • Visual tracking has become a topic of increasing interest over the past couple of decades due to its importance in numerous applications, such as intelligent video surveillance, vehicle navigation, and human-computer interaction
  • We propose a novel convolutional neural network based tracking algorithm, which first builds weak trackers from convolutional layers by applying correlation filters on the layer output, and hedges all weak trackers into a single stronger one using an online decision-theoretical Hedge algorithm
  • The tracking result in the current frame is the weighted decisions of all experts, which combines advantages of all the considered convolutional neural network layers
  • By factoring in historical performance of experts to make decisions, we propose an improved Hedge algorithm to update the weights of all experts, which is more suitable for real-world tracking tasks
  • We provide a comparison on 50 sequences in Figure 4 with one-pass evaluation results for convolutional neural network-SVM taken from [18]
  • We propose a novel convolutional neural network based tracking framework which uses an adaptive online decision learning algorithm to hedge weak trackers, obtained by correlation filters on convolutional neural network feature maps, into a stronger one to achieve better results
Results
  • The authors present extensive experimental evaluations on the proposed hedged deep tracker (HDT).
  • After the forward propagation, the authors use the outputs from six convolutional layers (10th∼12th, 14th∼16th) as six types of feature maps and all feature maps are resized to the same size.
  • This setting simultaneously takes the feature diversities and the computational cost into consideration.
  • The authors' implementation runs at 10 frames per second on a computer with an Intel I7-4790K
Conclusion
  • The authors propose a novel CNN based tracking framework which uses an adaptive online decision learning algorithm to hedge weak trackers, obtained by correlation filters on CNN feature maps, into a stronger one to achieve better results.
  • To the best of the knowledge, the proposed algorithm is the first to adaptively hedge features from different CNN layers in an online manner for visual tracking.
  • Extensive experimental evaluations on a large-scale benchmark dataset demonstrate the effectiveness of the proposed hedged deep tracking algorithm
Summary
  • Introduction:

    Visual tracking has become a topic of increasing interest over the past couple of decades due to its importance in numerous applications, such as intelligent video surveillance, vehicle navigation, and human-computer interaction.
  • Empirical studies using a large object tracking benchmark show that the performance of CNN based trackers surpasses that of hand-crafted features such as HOG [6], SIFT [24], and color histogram [28, 1].
  • Despite achieving state-of-the-art performance, existing CNN based trackers still have some limitations.
  • Most of these methods represent target objects only using features from very last layers of CNNs, which capture rich category-level semantic information, and are useful for object classification.
  • By factoring in historical performance of experts to make decisions, the authors propose an improved Hedge algorithm to update the weights of all experts, which is more suitable for real-world tracking tasks
  • Results:

    The authors present extensive experimental evaluations on the proposed hedged deep tracker (HDT).
  • After the forward propagation, the authors use the outputs from six convolutional layers (10th∼12th, 14th∼16th) as six types of feature maps and all feature maps are resized to the same size.
  • This setting simultaneously takes the feature diversities and the computational cost into consideration.
  • The authors' implementation runs at 10 frames per second on a computer with an Intel I7-4790K
  • Conclusion:

    The authors propose a novel CNN based tracking framework which uses an adaptive online decision learning algorithm to hedge weak trackers, obtained by correlation filters on CNN feature maps, into a stronger one to achieve better results.
  • To the best of the knowledge, the proposed algorithm is the first to adaptively hedge features from different CNN layers in an online manner for visual tracking.
  • Extensive experimental evaluations on a large-scale benchmark dataset demonstrate the effectiveness of the proposed hedged deep tracking algorithm
Related work
  • We give a brief review of tracking methods closely related to this work. Comprehensive reviews on visual tracking approaches can be found in [23, 27].

    Correlation filters based trackers. Correlation filters are introduced into visual tracking for its computational efficiency [4, 16, 17]. These methods approximate the dense sampling scheme by generating a circulant matrix, of which each row denotes a vectorized sample. As such, its regression model can be computed in the Fourier domain, which brings a large speed improvement in both training and testing stages. Bolme et al [4] develop the Minimum Output Sum of Squared Error (MOSSE) method to learn the filters, and use intensity features for object representation. In [16], Henriques et al propose a tracking method based on correlation filters by introducing kernel methods and employing ridge regression. Subsequently a method that extends the input features from a single channel to multiple channels (e.g., HOG) is presented [17]. Danelljan et al [7] propose an algorithm that searches over scale space for correlation filters to handle large variation in object size. However, all the above mentioned works use only one correlation filter, which limits the power of trackers based on correlation filters. In this work, we exploit the computational efficiency of correlation filters to construct an ensemble tracker where each component tracker is based on features extracted from one convolutional layer of a CNN.
Funding
  • This work was supported in part by National Basic Research Program of China (973 Program) 2015CB351802 and 2012CB316400, in part by National Natural Science Foundation of China 61332016, 61025011, 61133003, 61472103, 61390510, 61572465, and 61300111
  • Lim is supported partly by R&D programs by NRF (2014R1A1A2058501) and MSIP/NIPA (H8601-15-1005)
  • Yang is supported partly by the National Science Foundation CAREER Grant 1149783 and IIS Grant 1152576, and a gift from Adobe
Reference
  • A. Adam, E. Rivlin, and I. Shimshoni. Robust fragmentsbased tracking using the integral histogram. In CVPR, 2006. 1
    Google ScholarLocate open access versionFindings
  • S. Avidan. Ensemble tracking. TPAMI, 29(2):261–271, 2007. 2
    Google ScholarLocate open access versionFindings
  • Q. Bai, Z. Wu, S. Sclaroff, M. Betke, and C. Monnier. Randomized ensemble tracking. In ICCV, 2012
    Google ScholarFindings
  • D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2, 3
    Google ScholarFindings
  • K. Chaudhuri, Y. Freund, and D. Hsu. A parameter-free hedging algorithm. In NIPS, 2009. 2, 4, 5, 8
    Google ScholarLocate open access versionFindings
  • N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1
    Google ScholarLocate open access versionFindings
  • M. Danelljan, G. Hager, F. S. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014. 2, 3, 5
    Google ScholarLocate open access versionFindings
  • T. B. Dinh, N. Vo, and G. G. Medioni. Context tracker: Exploring supporters and distracters in unconstrained environments. In CVPR, 2011. 6
    Google ScholarLocate open access versionFindings
  • J. Fan, W. Xu, Y. Wu, and Y. Gong. Human tracking using convolutional neural networks. TNN, 21(10):1610–1623, 2010. 1, 2
    Google ScholarLocate open access versionFindings
  • Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, 1995. 2
    Google ScholarLocate open access versionFindings
  • R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, 2006. 1, 2
    Google ScholarLocate open access versionFindings
  • H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008. 1
    Google ScholarFindings
  • S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011. 1, 6
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the circulant structure of tracking-by-detection with kernels. In ECCV, 2012. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. TPAMI, 37(3):583–596, 2015. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia, 2014. 3
    Google ScholarLocate open access versionFindings
  • Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiers by structural constraints. In CVPR, 2010. 1, 6
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • G. Li, L. Qin, Q. Huang, J. Pang, and S. Jiang. Treat samples differently: Object tracking with semi-supervised online covboost. In ICCV, 2011. 1
    Google ScholarLocate open access versionFindings
  • X. Li, W. Hu, C. Shen, Z. Zhang, A. R. Dick, and A. van den Hengel. A survey of appearance models in visual object tracking. ACM TIST, 4(4):58, 2013. 2
    Google ScholarLocate open access versionFindings
  • D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999. 1
    Google ScholarLocate open access versionFindings
  • C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang. Hierarchical convolutional features for visual tracking. In ICCV, 2015. 1
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. 1, 2, 3
    Findings
  • A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah. Visual tracking: An experimental survey. TPAMI, 36(7):1442–1468, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • G. Tian, R. Hu, Z. Wang, and Y. Fu. Improved object tracking algorithm based on new HSV color probability model. In ISNN, 2009. 1
    Google ScholarLocate open access versionFindings
  • A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for MATLAB. In ACM Multimedia, 2015. 5
    Google ScholarLocate open access versionFindings
  • N. Wang and D.-Y. Yeung. Learning a deep compact image representation for visual tracking. In NIPS, 2013. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • N. Wang and D.-Y. Yeung. Ensemble-based tracking: Aggregating crowdsourced structured time Series data. In ICML, 2014. 2
    Google ScholarLocate open access versionFindings
  • L. Wen, D. Du, Z. Lei, S. Li, and M.-H. Yang. Jots: Joint online tracking and segmentation. In CVPR, 2015. 1
    Google ScholarLocate open access versionFindings
  • Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013. 6, 8
    Google ScholarLocate open access versionFindings
  • Y. Wu, J. Lim, and M.-H. Yang. Object tracking benchmark. TPAMI, 37:1834–1848, 2015. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • B. Zhang, A. Perina, Z. Li, V. Murino, J. Liu, and R. Ji. Bounding multiple gaussians uncertaninty with application to object tracking. IJCV, pages 1–16, 2016. 1
    Google ScholarLocate open access versionFindings
  • J. Zhang, S. Ma, and S. Sclaroff. MEEM: robust tracking via multiple experts using entropy minimization. In ECCV, 2014. 1, 6
    Google ScholarLocate open access versionFindings
  • S. Zhang, S. Kasiviswanathan, P. C. Yuen, and M. Harandi. Online dictionary learning on symmetric positive definite manifolds with vision applications. In AAAI, 2015. 1
    Google ScholarLocate open access versionFindings
  • S. Zhang, H. Yao, X. Sun, and X. Lu. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition, 46:1772–1788, 2013. 1
    Google ScholarLocate open access versionFindings
  • S. Zhang, H. Zhou, F. Jiang, and X. Li. Robust visual tracking using structurally random projection and weighted least squares. IEEE Trans. Circuits Syst. Video Techn., 25:1749– 1760, 2015. 1
    Google ScholarLocate open access versionFindings
  • T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via multi-task sparse learning. In CVPR, 2012. 1
    Google ScholarLocate open access versionFindings
  • W. Zhong, H. Lu, and M.-H. Yang. Robust object tracking via sparsity-based collaborative model. In CVPR, 2012. 1, 6
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments