Large-Scale Long-Tailed Recognition in an Open World

CVPR, pp. 2537-2546, 2019.

Cited by: 34|Bibtex|Views60|Links
EI
Keywords:
balanced testopen setneural networkclassification accuracydeep face recognitionMore(10+)
Weibo:
We introduce the Open Long-Tailed Recognition task that learns from natural long-tail open-end distributed data and optimizes the overall accuracy over a balanced test set

Abstract:

Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data an...More

Code:

Data:

0
Introduction
  • While the natural data distribution contains head, tail, and open classes (Fig. 1), existing classification approaches focus mostly on the head [7, 28], the tail [51, 25], often in a closed setting [55, 31].
  • The authors define OLTR as learning from long-tail and open-end distributed data and evaluating the classification accuracy over a balanced test set which include head, tail, and open classes in a continuous spectrum (Fig. 1)
Highlights
  • Our visual world is inherently long-tailed and openended: The frequency distribution of visual categories in our daily life is long-tailed [38], with a few common classes and many more rare classes, and we constantly encounter new visual concepts as we navigate in an open world.

    Open Long-tailed Recognition Imbalanced Classification

    Open World Few-shot Learning Head Classes Tail Classes Open Classes

    While the natural data distribution contains head, tail, and open classes (Fig. 1), existing classification approaches focus mostly on the head [7, 28], the tail [51, 25], often in a closed setting [55, 31]
  • We develop an integrated Open Long-Tailed Recognition algorithm that maps an image to a feature space such that visual concepts can relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world
  • All classes novel classes all classes all classes in the tail class, the recognition accuracy should maintain as high as possible; on the other hand, as the number of instances drops to zero in the open set, the recognition accuracy relies on the sensitivity to distinguish unknown open classes from known tail classes
  • We introduce the Open Long-Tailed Recognition task that learns from natural long-tail open-end distributed data and optimizes the overall accuracy over a balanced test set
  • We propose an integrated Open Long-Tailed Recognition algorithm, dynamic meta-embedding, in order to share visual knowledge between head and tail classes and to reduce confusion between tail and open classes
Methods
  • Softmax Pred. [19] Ours ODIN [26]† Ours† Error (%)

    the most related to the work, the authors directly contrast the results to the numbers reported in their paper.

    4.1.
  • Recall that the dynamic meta-embedding consists of three main components: memory feature, concept selector, and confidence calibrator.
  • From Fig. 5 (b), the authors observe that the combination of the memory feature and concept selector leads to large improvements on all three shots.
  • It is because the obtained memory feature transfers useful visual concepts among classes
  • Another observation is that the confidence calibrator is the most effective on few-shot classes.
  • The reachability estimation inside the confidence calibrator helps distinguish tail classes from open classes
Results
  • Accuracy Over ?

    all classes novel classes all classes all classes in the tail class, the recognition accuracy should maintain as high as possible; on the other hand, as the number of instances drops to zero in the open set, the recognition accuracy relies on the sensitivity to distinguish unknown open classes from known tail classes.

    An integrated OLTR algorithm should tackle the two seemingly contradictory aspects of recognition robustness and recognition sensitivity on a continuous category spectrum.
  • The authors learn to retrieve a summary of memory activations from the direct feature, combined into a meta-embedding that is enriched for the tail class.
  • Besides the overall top-1 classification accuracy [13] over all classes, the authors calculate the accuracy of three disjoint subsets: many-shot classes, medium-shot classes and few-shot classes
  • This helps them understand the detailed characteristics of each method.
  • The sof tmax probability threshold is initially set as 0.1, while a more detailed analysis is provided in Sec. 4.3
Conclusion
  • The authors introduce the OLTR task that learns from natural long-tail open-end distributed data and optimizes the overall accuracy over a balanced test set.
  • The authors propose an integrated OLTR algorithm, dynamic meta-embedding, in order to share visual knowledge between head and tail classes and to reduce confusion between tail and open classes.
  • The authors validate the method on three curated large-scale OLTR benchmarks (ImageNet-LT, Places-LT and MS1M-LT).
  • The authors' publicly available code and data would enable future research that is directly transferable to real-world applications
Summary
  • Introduction:

    While the natural data distribution contains head, tail, and open classes (Fig. 1), existing classification approaches focus mostly on the head [7, 28], the tail [51, 25], often in a closed setting [55, 31].
  • The authors define OLTR as learning from long-tail and open-end distributed data and evaluating the classification accuracy over a balanced test set which include head, tail, and open classes in a continuous spectrum (Fig. 1)
  • Methods:

    Softmax Pred. [19] Ours ODIN [26]† Ours† Error (%)

    the most related to the work, the authors directly contrast the results to the numbers reported in their paper.

    4.1.
  • Recall that the dynamic meta-embedding consists of three main components: memory feature, concept selector, and confidence calibrator.
  • From Fig. 5 (b), the authors observe that the combination of the memory feature and concept selector leads to large improvements on all three shots.
  • It is because the obtained memory feature transfers useful visual concepts among classes
  • Another observation is that the confidence calibrator is the most effective on few-shot classes.
  • The reachability estimation inside the confidence calibrator helps distinguish tail classes from open classes
  • Results:

    Accuracy Over ?

    all classes novel classes all classes all classes in the tail class, the recognition accuracy should maintain as high as possible; on the other hand, as the number of instances drops to zero in the open set, the recognition accuracy relies on the sensitivity to distinguish unknown open classes from known tail classes.

    An integrated OLTR algorithm should tackle the two seemingly contradictory aspects of recognition robustness and recognition sensitivity on a continuous category spectrum.
  • The authors learn to retrieve a summary of memory activations from the direct feature, combined into a meta-embedding that is enriched for the tail class.
  • Besides the overall top-1 classification accuracy [13] over all classes, the authors calculate the accuracy of three disjoint subsets: many-shot classes, medium-shot classes and few-shot classes
  • This helps them understand the detailed characteristics of each method.
  • The sof tmax probability threshold is initially set as 0.1, while a more detailed analysis is provided in Sec. 4.3
  • Conclusion:

    The authors introduce the OLTR task that learns from natural long-tail open-end distributed data and optimizes the overall accuracy over a balanced test set.
  • The authors propose an integrated OLTR algorithm, dynamic meta-embedding, in order to share visual knowledge between head and tail classes and to reduce confusion between tail and open classes.
  • The authors validate the method on three curated large-scale OLTR benchmarks (ImageNet-LT, Places-LT and MS1M-LT).
  • The authors' publicly available code and data would enable future research that is directly transferable to real-world applications
Tables
  • Table1: Comparison between our proposed OLTR task and related existing tasks
  • Table2: Open class detection error (%) comparison. It is performed on the standard open-set benchmark, CIFAR100 + TinyImageNet (resized). “†” denotes the setting where open samples are used to tune algorithmic parameters
  • Table3: Benchmarking results on (a) ImageNet-LT and (b) Places-LT. Our approach provides a comprehensive treatment to all the many/medium/few-shot classes as well as the open classes, achieving substantial advantages on all aspects
  • Table4: Benchmarking results on MegaFace (left) and SUN-LT (right). Our approach achieves the best performance on natural-world datasets when compared to other state-of-the-art methods. Furthermore, our approach achieves across-board improvements on both ‘male’ and ‘female’ sub-groups
Download tables as Excel
Related work
  • While OLTR has not been defined in the literature, there are three closely related tasks which are often studied in isolation: imbalanced classification, few-shot learning, and open-set recognition. Tab. 1 summarizes their differences.

    Imbalanced Classification. Arising from long-tail distributions of natural data, it has been extensively studied [41, 61, 3, 30, 62, 34, 29, 49, 6]. Classical methods include under-sampling head classes, over-sampling tail classes, and data instance re-weighting. We refer the readers to [17] for a detailed review. Some recent methods include metric learning [22, 33], hard negative mining [10, 27], and meta learning [15, 55]. The lifted structure loss [33] introduces margins between many training instances. The range loss [59] enforces data in the same class to be close and those in different classes to be far apart. The focal loss [27] induces an online version of hard negative mining. MetaModelNet [55] learns a meta regression net from head classes and uses it to construct the classifier for tail classes.
Funding
  • This research was supported, in part, by SenseTime Group Limited, NSF IIS 1835539, Berkeley Deep Drive, DARPA, and US Government fund through Etegent Technologies on Low-Shot Detection in Remote Sensing Imagery
Reference
  • Jimmy Ba, Geoffrey E Hinton, Volodymyr Mnih, Joel Z Leibo, and Catalin Ionescu. Using fast weights to attend to the recent past. In NIPS, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • Abhijit Bendale and Terrance E Boult. Towards open set deep networks. In CVPR, 2016. 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Samy Bengio. The battle against the long tail. In Talk on Workshop on Big Data and Statistical Machine Learning, 2015. 2
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. TPAMI, 2013. 8
    Google ScholarLocate open access versionFindings
  • Luca Bertinetto, Joao F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. Learning feed-forward one-shot learners. In NIPS, 2016. 3
    Google ScholarLocate open access versionFindings
  • Yin Cui, Yang Song, Chen Sun, Andrew Howard, and Serge Belongie. Large scale fine-grained categorization and domain-specific transfer learning. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 1, 3, 5
    Google ScholarLocate open access versionFindings
  • Jiankang Deng, Jia Guo, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698, 2015
    Findings
  • Terrance DeVries and Graham W Taylor. Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865, 2018. 3
    Findings
  • Qi Dong, Shaogang Gong, and Xiatian Zhu. Class rectification hard mining for imbalanced deep learning. In ICCV, 2017. 2
    Google ScholarLocate open access versionFindings
  • Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. 2, 3
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017. 3
    Findings
  • Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In CVPR, 2018. 3, 4, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In ECCV, 2016. 5
    Google ScholarLocate open access versionFindings
  • David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016. 2
    Findings
  • Bharath Hariharan and Ross B Girshick. Low-shot visual recognition by shrinking and hallucinating features. In ICCV, 2017. 1, 3, 5
    Google ScholarLocate open access versionFindings
  • Haibo He and Edwardo A Garcia. Learning from imbalanced data. TKDE, 2008. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 4, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Kevin Gimpel. Baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017. 6
    Google ScholarLocate open access versionFindings
  • Geoffrey E Hinton and David C Plaut. Using fast weights to deblur old memories. In Proceedings of the ninth annual conference of the Cognitive Science Society, 1987. 3
    Google ScholarLocate open access versionFindings
  • Yen-Chang Hsu, Zhaoyang Lv, and Zsolt Kira. Learning to cluster in order to transfer across domains and tasks. arXiv preprint arXiv:1711.10125, 2017. 4
    Findings
  • Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep representation for imbalanced classification. In CVPR, 2016. 2, 7
    Google ScholarLocate open access versionFindings
  • Ira Kemelmacher-Shlizerman, Steven M Seitz, Daniel Miller, and Evan Brossard. The megaface benchmark: 1 million faces for recognition at scale. In CVPR, 2016. 5
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 3
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 2015. 1
    Google ScholarLocate open access versionFindings
  • Shiyu Liang, Yixuan Li, and R Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018. 3, 6
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In ICCV, 2017. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 1
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In ICCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • Zhongqi Miao, Kaitlyn M Gaynor, Jiayun Wang, Ziwei Liu, Oliver Muellerklein, Mohammad S Norouzzadeh, Alex McInturff, Rauri CK Bowie, Ran Nathon, Stella X. Yu, and Wayne M. Getz. A comparison of visual features used by humans and machines to classify wildlife. bioRxiv, 2018. 1
    Google ScholarFindings
  • Tsendsuren Munkhdalai and Hong Yu. Meta networks. arXiv preprint arXiv:1703.00837, 2017. 3
    Findings
  • Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. Deep metric learning via lifted structured feature embedding. In CVPR, 2016. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Wanli Ouyang, Xiaogang Wang, Cong Zhang, and Xiaokang Yang. Factors in finetuning deep model for object detection with long-tail distribution. In CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • Hang Qi, Matthew Brown, and David G Lowe. Low-shot learning with imprinted weights. In CVPR, 2018. 3, 4
    Google ScholarLocate open access versionFindings
  • Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan Yuille. Few-shot image recognition by predicting parameters from activations. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017. 3
    Google ScholarLocate open access versionFindings
  • William J Reed. The pareto, zipf and other power laws. Economics letters, 2001. 1
    Google ScholarLocate open access versionFindings
  • Mengye Ren, Renjie Liao, Ethan Fetaya, and Richard S Zemel. Incremental few-shot learning with attention attractor networks. arXiv preprint arXiv:1810.07218, 2018. 3
    Findings
  • Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. In NIPS, 2017. 5
    Google ScholarLocate open access versionFindings
  • Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011. 2
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In ICML, 2016. 3
    Google ScholarLocate open access versionFindings
  • Nikolay Savinov, Anton Raichuk, Raphael Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, and Sylvain Gelly. Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274, 2018. 4
    Findings
  • Walter J Scheirer, Anderson de Rezende Rocha, Archana Sapkota, and Terrance E Boult. Toward open set recognition. TPAMI, 2013. 3
    Google ScholarLocate open access versionFindings
  • Jurgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 1992. 3
    Google ScholarLocate open access versionFindings
  • Jurgen Schmidhuber. A neural network that embeds its own meta-levels. In ICNN, 1993. 3
    Google ScholarLocate open access versionFindings
  • Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV, 2016. 5
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, 2017. 1, 3, 4
    Google ScholarLocate open access versionFindings
  • Grant Van Horn and Pietro Perona. The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450, 2017. 2
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017. 4
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Tim Lillicrap, and Daan Wierstra. Matching networks for one shot learning. In NIPS, 2016. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. arXiv preprint arXiv:1711.07971, 2017. 2, 4
    Findings
  • Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. arXiv preprint arXiv:1801.05401, 2018. 3, 5
    Findings
  • Yu-Xiong Wang and Martial Hebert. Learning to learn: Model regression networks for easy small sample learning. In ECCV, 2016. 5, 7
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In NIPS, 2017. 1, 2, 5, 7
    Google ScholarLocate open access versionFindings
  • Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In ECCV, 2016. 4
    Google ScholarLocate open access versionFindings
  • Flood Sung Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014. 8
    Google ScholarLocate open access versionFindings
  • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, and Yu Qiao. Range loss for deep face recognition with longtailed training data. In CVPR, 2017. 2, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. TPAMI, 2018. 5
    Google ScholarLocate open access versionFindings
  • Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. Capturing long-tail distributions of object subcategories. In CVPR, 2014. 2
    Google ScholarLocate open access versionFindings
  • Xiangxin Zhu, Carl Vondrick, Charless C Fowlkes, and Deva Ramanan. Do we need more training data? IJCV, 2016. 2
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments