Cost-Effective Active Learning for Deep Image Classification

    IEEE Transactions on Circuits and Systems for Video Technology, pp. 2591-2600, 2017.

    Cited by: 211|Bibtex|Views66|Links
    EI WOS
    Keywords:
    Machine learningUncertaintyMeasurement uncertaintyLearning systemsNeural networksMore(1+)
    Wei bo:
    As illustrated in Fig. 3, Table III(a) and, our proposed cost-effective AL framework overcomes the compared method from the aspects of the recognition accuracy and user annotation amount

    Abstract:

    Recent successes in learning-based image classification, however, heavily rely on the large number of annotated training samples, which may require considerable human effort. In this paper, we propose a novel active learning (AL) framework, which is capable of building a competitive classifier with optimal feature representation via a lim...More

    Code:

    Data:

    0
    Introduction
    • A IMING at improving the existing models by incrementally selecting and annotating the most informative unlabeled samples, active learning (AL) has been well studied in the past few decades [3]–[12], and applied to various kinds

      Manuscript received June 26, 2015; revised January 5, 2016 and April 25, 2016; accepted July 1, 2016.
    • In the AL methods [3]–[5], the classifier/model is first initialized with a relatively small set of labeled training samples.
    • It is continuously boosted by selecting and pushing some of the most informative samples to user for annotation.
    • The effectiveness of AL on more challenging image classification tasks has not been studied well
    Highlights
    • A IMING at improving the existing models by incrementally selecting and annotating the most informative unlabeled samples, active learning (AL) has been well studied in the past few decades [3]–[12], and applied to various kinds

      Manuscript received June 26, 2015; revised January 5, 2016 and April 25, 2016; accepted July 1, 2016
    • 1) Data Set Description: we evaluate our cost-effective AL framework on two public challenging benchmarks, i.e., celebrity face recognition data set [1] and the Caltech-256 object categorization [2] data set
    • 1) Comparison Results: To demonstrate the effectiveness of our proposed framework, we apply the margin sampling criterion to measure the uncertainty of samples and denote this method by CEAL_MS
    • As illustrated in Fig. 3, Table III(a) and (b), our proposed cost-effective AL framework overcomes the compared method from the aspects of the recognition accuracy and user annotation amount
    • CEAL_MS needs only 78% labeled samples and reduces around 19% and 15% user annotations, compared with AL_RAND and TCAL. This justifies that our proposed cost-effective AL framework can effectively reduce the need of labeled samples
    • The average error rate is quite low even at early iterations
    • We propose a cost-effective AL framework for deep image classification tasks, which employs a complementary sample selection strategy: progressively select the minority of most informative samples and pseudolabel the majority of high-confidence samples for model updating
    Methods
    • To demonstrate that the proposed CEAL framework can improve the classification performance with less labeled data, the authors compare CEAL with new state-ofthe-art AL [triple criteria AL (TCAL)] and baseline methods (AL_ALL and AL_RAND).

      1) AL_ALL: The authors manually label all the training samples and use them to train the CNN.
    • 2) AL_RAND: During the training process, the authors randomly select samples to be annotated to fine-tune the CNN.
    • This method discards all AL techniques and can be considered as the lower bound.
    • For TCAL, the authors follow the pipeline of [3] by training an SVM classifier and applying the uncertainty, diversity and density criteria to select the most informative samples.
    • The SVM classifier of TCAL is further updated
    Results
    • To demonstrate the effectiveness of the proposed framework, the authors apply the MS criterion to measure the uncertainty of samples and denote this method by CEAL_MS.
    • Fig. 3 illustrates the accuracy-percentage of annotated samples curve of AL_RAND, AL_ALL, TCAL, and the proposed CEAL_MS on both CACD and Caltech-256 data sets
    • This curve demonstrates the classification accuracy under different percentages of annotated samples of the whole training set.
    • From the aspect of the user annotation amount, to achieve 91.5% recognition accuracy on the CACD data set, AL_RAND and TCAL require 99% and 81% labeled training samples, respectively.
    • This justifies that the proposed CEAL framework can effectively reduce the need of labeled samples
    Conclusion
    • The authors propose a CEAL framework for deep image classification tasks, which employs a complementary sample selection strategy: progressively select the minority of most informative samples and pseudolabel the majority of high-confidence samples for model updating.
    • In such a holistic manner, the minority of labeled samples benefit the decision boundary of classifier and the majority of pseudolabeled samples provide sufficient training data for robust feature learning.
    Summary
    • Introduction:

      A IMING at improving the existing models by incrementally selecting and annotating the most informative unlabeled samples, active learning (AL) has been well studied in the past few decades [3]–[12], and applied to various kinds

      Manuscript received June 26, 2015; revised January 5, 2016 and April 25, 2016; accepted July 1, 2016.
    • In the AL methods [3]–[5], the classifier/model is first initialized with a relatively small set of labeled training samples.
    • It is continuously boosted by selecting and pushing some of the most informative samples to user for annotation.
    • The effectiveness of AL on more challenging image classification tasks has not been studied well
    • Methods:

      To demonstrate that the proposed CEAL framework can improve the classification performance with less labeled data, the authors compare CEAL with new state-ofthe-art AL [triple criteria AL (TCAL)] and baseline methods (AL_ALL and AL_RAND).

      1) AL_ALL: The authors manually label all the training samples and use them to train the CNN.
    • 2) AL_RAND: During the training process, the authors randomly select samples to be annotated to fine-tune the CNN.
    • This method discards all AL techniques and can be considered as the lower bound.
    • For TCAL, the authors follow the pipeline of [3] by training an SVM classifier and applying the uncertainty, diversity and density criteria to select the most informative samples.
    • The SVM classifier of TCAL is further updated
    • Results:

      To demonstrate the effectiveness of the proposed framework, the authors apply the MS criterion to measure the uncertainty of samples and denote this method by CEAL_MS.
    • Fig. 3 illustrates the accuracy-percentage of annotated samples curve of AL_RAND, AL_ALL, TCAL, and the proposed CEAL_MS on both CACD and Caltech-256 data sets
    • This curve demonstrates the classification accuracy under different percentages of annotated samples of the whole training set.
    • From the aspect of the user annotation amount, to achieve 91.5% recognition accuracy on the CACD data set, AL_RAND and TCAL require 99% and 81% labeled training samples, respectively.
    • This justifies that the proposed CEAL framework can effectively reduce the need of labeled samples
    • Conclusion:

      The authors propose a CEAL framework for deep image classification tasks, which employs a complementary sample selection strategy: progressively select the minority of most informative samples and pseudolabel the majority of high-confidence samples for model updating.
    • In such a holistic manner, the minority of labeled samples benefit the decision boundary of classifier and the majority of pseudolabeled samples provide sufficient training data for robust feature learning.
    Tables
    • Table1: DETAILED CONFIGURATION OF THE CNN ARCHITECTURE USED IN
    • Table2: DETAILED CONFIGURATION OF THE CNN ARCHITECTURE USED IN CALTECH-256 [<a class="ref-link" id="c2" href="#r2">2</a>]. IT TAKES THE 256 × 256 × 3 IMAGES AS INPUT,
    • Table3: CLASS ACCURACY PER SOME SPECIFIC AL ITERATIONS ON THE (a) CACD AND (b) CALTECH-256 DATA SETS
    Download tables as Excel
    Related work
    • The key idea of the AL is that a learning algorithm should achieve higher accuracy with a fewer labeled training samples, if it is allowed to choose the ones from which it learns [31]. In this way, the instance selection scheme is becoming extremely important. One of the most common strategy is the uncertainty-based selection [12], [18], which measures the uncertainties of novel unlabeled samples from the predictions of previous classifiers. Lewis [12] proposed to extract the sample, which has the largest EN on the conditional distribution over predicted labels, as the most uncertain instance. The support vector machine (SVM)-based method [18] determined the uncertain samples based on the relative distance between the candidate samples and the decision boundary. Some earlier works [19], [38] also determined the sample uncertainty referring to a committee of classifiers (i.e., examining the disagreement among class labels assigned by a set of classifiers). Such a theoretically motivated framework is called query-by-committee in literature [31]. All the above-mentioned uncertainty-based methods usually ignore the majority of certain unlabeled samples and thus are sensitive to outliers. The latter methods have taken the information density measure into account and exploited the information of unlabeled data when selecting samples. These approaches usually sequentially select the informative samples relying on the probability estimation [6], [37] or prior information [8] to minimize the generalization error of the trained classifier over the unlabeled data. For example, Joshi et al [6] considered the uncertainty sampling method based on the probability estimation of class membership for all the instances in the selection pool, and such a method can be effective to handle the multiclass case. In [8], some context constraints are introduced as the priori to guide users to tag the face images more efficiently. At the same time, a series of works [7], [24] is proposed to take the samples to maximize the increase of mutual information between the candidate instance and the remaining unlabeled instances under the Gaussian process framework. Li and Guo [10] presented a novel adaptive AL approach that combines an information density measure and a most uncertainty measure together to label critical instances for image classifications. Moreover, the diversity of the selected instance over the certain category has been taken into consideration in [4] as well. Such a work is also the pioneer study expanding the SVM-based AL from the single mode to batch mode. Recently, Elhamifar et al [11] further integrated the uncertainty and diversity measurement into a unified batch mode framework via convex programming for unlabeled sample selection. Such an approach is more feasible to conjunction with any type of classifiers, but not limited in max-margin based ones. It is obvious that all the abovementioned AL methods consider only those low-confidence samples (e.g., uncertain and diverse samples), but losing the sight of a large majority of high-confidence samples. We hold that due to the majority and consistency, these high-confidence samples will also be beneficial to improve the accuracy and keep the stability of classifiers. Even more, we shall demonstrate that considering these high-confidence samples can also reduce the user effort of annotation effectively.
    Funding
    • This work was supported in part by the National Natural Science Foundation of China under Grant 61622214, in part by the State Key Development Program under Grant 2016YFB1001000, in part sponsored by CCF-Tencent Open Fund, in part by the Special Program through the Applied Research on Super Computation of the Natural Science Foundation of China–Guangdong Joint Fund (the second phase), and in part by NVIDIA Corporation through the Tesla K40 GPU
    Reference
    • B.-C. Chen, C.-S. Chen, and W. H. Hsu, “Cross-age reference coding for age-invariant face recognition and retrieval,” in Proc. ECCV, 2014, pp. 768–783.
      Google ScholarLocate open access versionFindings
    • G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset,” California Inst. Technol., Pasadena, CA, USA, Tech. Rep. 7694, 2007.
      Google ScholarLocate open access versionFindings
    • B. Demir and L. Bruzzone, “A novel active learning method in relevance feedback for content-based remote sensing image retrieval,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2323–2334, May 2015.
      Google ScholarLocate open access versionFindings
    • K. Brinker, “Incorporating diversity in active learning with support vector machines,” in Proc. ICML, 2003, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • B. Long, J. Bian, O. Chapelle, Y. Zhang, Y. Inagaki, and Y. Chang, “Active learning for ranking through expected loss optimization,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 5, pp. 1180–1191, May 2015.
      Google ScholarLocate open access versionFindings
    • A. J. Joshi, F. Porikli, and N. Papanikolopoulos, “Multi-class active learning for image classification,” in Proc. CVPR, Jun. 2009, pp. 2372–2379.
      Google ScholarLocate open access versionFindings
    • A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell, “Active learning with Gaussian processes for object categorization,” in Proc. ICCV, Oct. 2007, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • A. Kapoor, G. Hua, A. Akbarzadeh, and S. Baker, “Which faces to tag: Adding prior constraints into active learning,” in Proc. ICCV, Sep./Oct. 2009, pp. 1058–1065.
      Google ScholarLocate open access versionFindings
    • R. M. Castro and R. D. Nowak, “Minimax bounds for active learning,” IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2339–2353, May 2008.
      Google ScholarLocate open access versionFindings
    • X. Li and Y. Guo, “Adaptive active learning for image classification,” in Proc. CVPR, Jun. 2013, pp. 859–866.
      Google ScholarLocate open access versionFindings
    • E. Elhamifar, G. Sapiro, A. Yang, and S. S. Sasrty, “A convex optimization framework for active learning,” in Proc. Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 209–216.
      Google ScholarLocate open access versionFindings
    • D. D. Lewis, “A sequential algorithm for training text classifiers: Corrigendum and additional data,” ACM SIGIR Forum, vol. 29, no. 2, pp. 13–19, 1995.
      Google ScholarLocate open access versionFindings
    • X. Li and Y. Guo, “Multi-level adaptive active learning for scene classification,” in Proc. ECCV, 2014, pp. 234–249.
      Google ScholarLocate open access versionFindings
    • B. Zhang, Y. Wang, and F. Chen, “Multilabel image classification via high-order label correlation driven active learning,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1430–1441, Mar. 2014.
      Google ScholarLocate open access versionFindings
    • F. Sun, M. Xu, and X. Jiang, “Robust multi-label image classification with semi-supervised learning and active learning,” in Proc. 21st Int. Conf. MultiMedia Modeling, 2015, pp. 512–523.
      Google ScholarLocate open access versionFindings
    • S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proc. ICML, 2006, pp. 417–424.
      Google ScholarLocate open access versionFindings
    • D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery, “Correction to ‘active learning methods for remote sensing image classification,”’ IEEE Trans. Geosci. Remote Sens., vol. 48, no. 6, p. 2767, Jun. 2010.
      Google ScholarLocate open access versionFindings
    • S. Tong and D. Koller, “Support vector machine active learning with applications to text classification,” J. Mach. Learn. Res., vol. 2, pp. 45–66, Mar. 2001.
      Google ScholarLocate open access versionFindings
    • A. McCallum and K. Nigam, “Employing EM and pool-based active learning for text classification,” in Proc. ICML, 1998, pp. 350–358.
      Google ScholarLocate open access versionFindings
    • G. Schohn and D. Cohn, “Less is more: Active learning with support vector machines,” in Proc. ICML, 2000, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • S. Vijayanarasimhan and K. Grauman, “Large-scale live active learning: Training object detectors with crawled data and crowds,” in Proc. CVPR, Jun. 2011, pp. 1449–1456.
      Google ScholarLocate open access versionFindings
    • A. G. Hauptmann, W.-H. Lin, R. Yan, J. Yang, and M.-Y. Chen, “Extreme video retrieval: Joint maximization of human and computer performance,” in Proc. 14th ACM Int. Conf. Multimedia, 2006, pp. 385–394.
      Google ScholarLocate open access versionFindings
    • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. NIPS, 2012, pp. 1097–1105.
      Google ScholarLocate open access versionFindings
    • D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proc. CVPR, Jun. 2012, pp. 3642–3649.
      Google ScholarLocate open access versionFindings
    • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. CVPR, Jun. 2009, pp. 248–255.
      Google ScholarLocate open access versionFindings
    • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. ICLR, 2015, pp. 1–14.
      Google ScholarLocate open access versionFindings
    • L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Self-paced curriculum learning,” in Proc. AAAI, 2015, pp. 1–7.
      Google ScholarLocate open access versionFindings
    • Q. Zhao, D. Meng, L. Jiang, Q. Xie, Z. Xu, and A. G. Hauptmann, “Self-paced learning for matrix factorization,” in Proc. AAAI, 2015, pp. 3196–3202.
      Google ScholarLocate open access versionFindings
    • L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann, “Easy samples first: Self-paced reranking for zero-example multimedia search,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 547–556.
      Google ScholarLocate open access versionFindings
    • L. Jiang, D. Meng, S.-I. Yu, Z. Lan, S. Shan, and A. G. Hauptmann, “Self-paced learning with diversity,” in Proc. NIPS, 2014, pp. 2078–2086.
      Google ScholarLocate open access versionFindings
    • B. Settles, “Active learning literature survey,” Comput. Sci. Dept., Univ. Wisconsin–Madison, Madison, WI, USA, Tech. Rep. 1648, 2009.
      Google ScholarLocate open access versionFindings
    • T. Scheffer, C. Decomain, and S. Wrobel, “Active hidden Markov models for information extraction,” in Proc. IDA, 2001, pp. 309–318.
      Google ScholarLocate open access versionFindings
    • C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001.
      Google ScholarLocate open access versionFindings
    • M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Proc. NIPS, 2010, pp. 1189–1197.
      Google ScholarLocate open access versionFindings
    • Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proc. ICML, 2009, pp. 41–48.
      Google ScholarLocate open access versionFindings
    • Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, “Selective sampling using the query by committee algorithm,” Mach. Learn., vol. 28, no. 2, pp. 133–168, 1997.
      Google ScholarLocate open access versionFindings
    • A. J. Joshi, F. Porikli, and N. P. Papanikolopoulos, “Scalable active learning for multiclass image classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2259–2273, Nov. 2012.
      Google ScholarLocate open access versionFindings
    • X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Proc. CVPR, Jun. 2013, pp. 532–539.
      Google ScholarLocate open access versionFindings
    • V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. ICML, 2010, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
      Google ScholarLocate open access versionFindings
    • J. Donahue et al., “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Proc. 31st Int. Conf. Mach. Learn. (ICML), Beijing, China, Jun. 2014, pp. 21–26.
      Google ScholarFindings
    • Y. Jia et al., “Caffe: Convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675–678. Keze Wang received the B.S. degree in software engineering from Sun Yat-sen University, Guangzhou, China, in 2012. He is currently pursuing the Ph.D. degrees in computer science and technology with Sun Yat-sen University and Hong Kong Polytechnic University, Hong Kong, under the supervision of Prof. L. Lin and Prof. L. Zhang.
      Google ScholarLocate open access versionFindings
    • Dongyu Zhang received the B.S. and Ph.D. degrees from the Harbin Institute of Technology, Harbin, China, in 2003 and 2010, respectively.
      Google ScholarFindings
    • Ya Li received the B.E. degree from Zhengzhou University, Zhengzhou, China, in 2002, the M.E. degree from Southwest Jiaotong University, Chengdu, China, in 2006, and the Ph.D. degree from Sun Yat-sen University, Guangzhou, China, in 2015.
      Google ScholarFindings
    • Ruimao Zhang received the B.E. degree from the School of Software, Sun Yat-sen University, Guangzhou, China, in 2011, where he is currently pursuing the Ph.D. degree in computer science with the School of Information Science and Technology.
      Google ScholarLocate open access versionFindings
    • He was a Visiting Ph.D. Student with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, from 2013 to 2014. His current research interests include computer vision, pattern recognition, machine learning, and related applications.
      Google ScholarFindings
    • Liang Lin (SM’14) received the B.S. and Ph.D. degrees from the Beijing Institute of Technology, Beijing, China, in 1999 and 2008, respectively.
      Google ScholarLocate open access versionFindings
    • He was a Post-Doctoral Research Fellow with the Department of Statistics, University of California at Los Angeles, Los Angeles, CA, USA, from 2008 to 2010. He was a Visiting Scholar with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, and with the Department of Electronic Engineering, Chinese University of Hong Kong, Hong Kong. He is currently a Professor with the School of Computer Science, Sun Yat-sen University, Guangzhou, China. He has authored over 100 papers in top tier academic journals and conferences. His current research interests include new models, algorithms and systems for intelligent processing, and understanding of visual data, such as images and videos. Prof. Lin received the Best Paper Runners-Up Award in ACM NPAR 2010, the Google Faculty Award in 2012, the Best Student Paper Award in the IEEE ICME 2014, and the Hong Kong Scholars Award in 2014. He currently serves as an Associate Editor of the IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments