Not just Birds and Cars: Generic, Scalable and Explainable Models for Professional Visual Recognition
arxiv(2024)
摘要
Some visual recognition tasks are more challenging then the general ones as
they require professional categories of images. The previous efforts, like
fine-grained vision classification, primarily introduced models tailored to
specific tasks, like identifying bird species or car brands with limited
scalability and generalizability. This paper aims to design a scalable and
explainable model to solve Professional Visual Recognition tasks from a generic
standpoint. We introduce a biologically-inspired structure named Pro-NeXt and
reveal that Pro-NeXt exhibits substantial generalizability across diverse
professional fields such as fashion, medicine, and art-areas previously
considered disparate. Our basic-sized Pro-NeXt-B surpasses all preceding
task-specific models across 12 distinct datasets within 5 diverse domains.
Furthermore, we find its good scaling property that scaling up Pro-NeXt in
depth and width with increasing GFlops can consistently enhances its accuracy.
Beyond scalability and adaptability, the intermediate features of Pro-NeXt
achieve reliable object detection and segmentation performance without extra
training, highlighting its solid explainability. We will release the code to
foster further research in this area.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要