MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
CoRR(2024)
摘要
The scarcity of annotated data has sparked significant interest in
unsupervised pre-training methods that leverage medical reports as auxiliary
signals for medical visual representation learning. However, existing research
overlooks the multi-granularity nature of medical visual representation and
lacks suitable contrastive learning techniques to improve the models'
generalizability across different granularities, leading to the
underutilization of image-text information. To address this, we propose MLIP, a
novel framework leveraging domain-specific medical knowledge as guiding signals
to integrate language information into the visual domain through image-text
contrastive learning. Our model includes global contrastive learning with our
designed divergence encoder, local token-knowledge-patch alignment contrastive
learning, and knowledge-guided category-level contrastive learning with expert
knowledge. Experimental evaluations reveal the efficacy of our model in
enhancing transfer performance for tasks such as image classification, object
detection, and semantic segmentation. Notably, MLIP surpasses state-of-the-art
methods even with limited annotated data, highlighting the potential of
multimodal pre-training in advancing medical representation learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要