Feature Pyramid Vision Transformer for MedMNIST Classification Decathlon

2022 International Joint Conference on Neural Networks (IJCNN)(2022)

引用 18|浏览11
暂无评分
摘要
MedMNIST is a medical dataset proposed to block the need for medical knowledge, but there is currently no model that can generalize well on all its sub-datasets. Owing to the inadequacy of long-range relation modeling, models based on convolutional neural networks (CNNs) cannot fully learn the information of images. Besides, relying only on high-level features limits the generalization effect as well. All of these remain challenges for MedMNIST Classification Decathlon. In this paper, we proposed Feature Pyramid Vision Transformer (FPViT), a strong alternative for MedMNIST Classification Decathlon. Our FPViT exhibits enhanced feature learning and modeling capabilities, which merits both residual network (ResNet) and Vision Transformer (ViT). Transformers in our model take the features extracted by ResNet as sequences to capture global contexts which compensate for the lack of locality of convolution operations. Moreover, the feature pyramid designed in our model effectively utilizes the multi-scale feature maps from basic layers of ResNet. These multi-scale features from low-level to high level enable our model to have better adaptability. And, the final prediction is based on the multi-scale ViT and the original ResNet heads. Through experiments, our FPViT can achieve superior classification and generalization on MedMNIST than state-of-the-art methods.
更多
查看译文
关键词
Medical image analysis,MedMNIST,Vision Transformer,Multi-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要