An Exploration into the Benefits of the CLIP model for Lifelog Retrieval.

Ly-Duyen Tran,Naushad Alam,Yvette Graham, Linh Khanh Vo, Nghiem Tuong Diep,Binh Nguyen,Liting Zhou,Cathal Gurrin

International Conference on Content-Based Multimedia Indexing (CBMI)（2022）

引用 1|浏览30

暂无评分

摘要

In this paper, we attempt to fine-tune the CLIP (Contrastive Language-Image Pre-Training) model on the Lifelog Question Answering dataset (LLQA) to investigate retrieval performance of the fine-tuned model over the zero-shot baseline model. We train the model adopting a weight space ensembling approach using a modified loss function to take into account the differences in our dataset (LLQA) when compared with the dataset the CLIP model was originally pretrained on. We further evaluate our fine-tuned model using visual as well as multimodal queries on multiple retrieval tasks, demonstrating improved performance over the zero-shot baseline model.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要