Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples

Sungjae Kang,Kisung Seo

Journal of Electrical Engineering & Technology(2024)

引用 0|浏览0
暂无评分
摘要
Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.
更多
查看译文
关键词
Sketch-based Image-Retrieval,Knowledge Distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要