Deep-view linguistic and inductive learning (DvLIL) based framework for Image Retrieval

SSRN Electronic Journal(2023)

引用 0|浏览8
暂无评分
摘要
The presence of abundant data over the Internet leads to complex issues when it comes to retrieving desired information from such a large volume of content. There are situations where a user needs to modify desired information across different modalities. One common example is retrieving a desired product from inventory of online commercial platforms. In such cases, a dual-modality-based Content-Based Image Retrieval (CBIR) system plays a key role in facilitating communication between the user and the agent. This research proposes a framework that is built for the retrieval of desired images with modified features. The proposed framework is based on the extraction of image and text features, followed by their combined representation through inductive learning. It learns deep insights of visual features, which are then modified by linguistic semantics. State-of-the-art deep learning techniques are employed for dense representation of both image and text features. After successfully representing the image and text queries, their combined representation is learned using a sequence of MLP (multi-layer perceptrons). The proposed approach outperformed on real-time benchmark datasets, Fashion-200K and MIT-States.
更多
查看译文
关键词
Image retrieval framework,Image feature extraction,Text feature extraction,RESNET-50,BERT,Feature joint venture,Inductive learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要